如何匹配任何单词但忽略那些以多个空格开头的单词?

时间:2015-11-25 12:46:51

标签: java regex

我想要实现的是匹配文本中的所有单词,但忽略那些以4个空格开头的行(在新行之前)。

示例

查找单词的文本文件:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do 
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut 
enim ad minim veniam, quis nostrud exercitation ullamco laboris 
nisi ut aliquip ex ea commodo consequat.

    This must NOT be matched. Because it has 4 whitespaces at the beginning.

Lorem ipsum dolor sit amet. Ut enim ad minim veniam.


因此,以下行中的单词不应被视为匹配模式:

This must NOT be matched. Because it has 4 whitespaces at the beginning.


代码

这是我的正则表达式,它可以找到所有单词:

\\b[A-Za-z]+\\b

我知道在Java的RegEx语法中有except^符号,但我只知道如何在更简单的表达式中使用它。

2 个答案:

答案 0 :(得分:2)

也许以下代码段可能是您想要实现目标的基础。

id_storage

输出

String[] lines = {"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do",
    "eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut",
    "enim ad minim veniam, quis nostrud exercitation ullamco laboris",
    "nisi ut aliquip ex ea commodo consequat.",
    "",
    "    This must NOT be matched. Because it has 4 whitespaces at the beginning.",
    "",
    "Lorem ipsum dolor sit amet. Ut enim ad minim veniam."};
for (String line : lines) {
    if (!line.startsWith("    ")) {
        String[] words = line.split("[\\p{IsPunctuation}\\p{IsWhite_Space}]+");
        System.out.println("words = " + Arrays.toString(words));
    }
}

PS:正则表达式是从this answer

借来的

答案 1 :(得分:1)

以下应该这样做

"http://192.168.3.114:8080/compierews/" | Select-String -Pattern '^http://(.*):8080/(.*)/$'  | % {"IP is $($_.matches.groups[1]) and path is $($_.matches.groups[2])"}

IP is 192.168.3.114 and path is compierews

它以negative lookbehind开头,所以它不会匹配前面有(?<!\s{4})\\b[A-Za-z]+\\b 的任何内容。

相关问题