Java中的标点正则表达式

时间:2011-11-20 10:27:56

标签: java regex

首先,我阅读了以下文档

http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html

我希望找到任何标点字符,除了@',&但我不太明白。

这是:

public static void main( String[] args )
{       
     // String to be scanned to find the pattern.
     String value = "#`~!#$%^";
     String pattern = "\\p{Punct}[^@',&]";

    // Create a Pattern object
    Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);

    // Now create matcher object.
    Matcher m = r.matcher(value);
    if (m.find()) {
       System.out.println("Found value: " + m.groupCount());
    } else {
       System.out.println("NO MATCH");
    }


}

结果是没有比赛。
有什么不匹配吗?

感谢
MRizq

2 个答案:

答案 0 :(得分:30)

你匹配两个字符,而不是一个。使用(负面)预测应该可以解决任务:

(?![@',&])\\p{Punct}

答案 1 :(得分:1)

您可以在此处使用character subtraction

Client:
 Version:      18.03.1-ce
 API version:  1.30 (downgraded from 1.37)
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Wed Jun 20 21:43:51 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      17.06.2-ce
  API version:  1.30 (minimum version 1.12)
  Go version:   go1.8.3
  Git commit:   a04f55b
  Built:        Thu Sep 21 20:36:57 2017
  OS/Arch:      linux/amd64
  Experimental: false

整个模式代表一个字符类String pat = "[\\p{Punct}&&[^@',&]]"; ,其中包含[...] POSIX character class\p{Punct} intersection operator&& {{ 3}}。

如果您还计划匹配所有Unicode标点,则可能需要Unicode修饰符:

[^...]

该模式与String pat = "(?U)[\\p{Punct}&&[^@',&]]"; ^^^^ \p{Punct}@'以外的所有标点(带有,)匹配。

如果需要排除更多字符,请将其添加到否定字符类中。只要记住始终在Java正则表达式字符类/集中转义&-\^[。例如。添加反斜杠和]可能看起来像-"[\\p{Punct}&&[^@',&\\\\-]]"

negated character class

"[\\p{Punct}&&[^@',&\\-\\\\]]"

输出:

String value = "#`~!#$%^,";
String pattern = "(?U)[\\p{Punct}&&[^@',&]]";
Pattern r = Pattern.compile(pattern);    // Create a Pattern object
Matcher m = r.matcher(value);            // Now create matcher object.
while (m.find()) {
    System.out.println("Found value: " + m.group());
}