正则表达式 - 需要帮助

时间:2010-09-30 03:58:37

标签: java regex pattern-matching

我有一个String模板,我需要从中获取#elseif块的列表。例如,第一个#elseif块将来自

#elseif ( $variable2 )Some sample text after 1st ElseIf.

,第二个#elseif块来自#elseif($variable2)This text can be repeated many times until do while is called. SECOND ELSEIF

等等。我正在使用以下正则表达式。

String regexElseIf="\\#elseif\\s*\\((.*?)\\)(.*?)(?:#elseif|#else|#endif)"; 

但它只返回一个匹配,即第一个#elseif块,而不是第二个。我还需要获得第二个#elseif块。你能帮帮我吗?请找到以下字符串模板。

  String template =
        "This is a sample document."
            + "#if ( $variable1 )"
            + "FIRST This text can be repeated many times until do while is called."
            + "#elseif ( $variable2 )"
            + "Some sample text after 1st ElseIf."
            + "#elseif($variable2)"
            + "This text can be repeated many times until do while is called. SECOND ELSEIF"
            + "#else "
            + "sample else condition  "
            + "#endif "
            + "Some sample text."
            + "This is the second sample document."
            + "#if ( $variable1 )"
            + "SECOND FIRST This text can be repeated many times until do while is called."
            + "#elseif ( $variable2 )"
            + "SECOND Some sample text after 1st ElseIf."
            + "#elseif($variable2)"
            + "SECOND This text can be repeated many times until do while is called. SECOND ELSEIF"
            + "#else " + "SECOND sample else condition  " + "#endif "
            + "SECOND Some sample text.";

4 个答案:

答案 0 :(得分:2)

此代码

Pattern regexp = Pattern.compile("#elseif\\b(.*?)(?=#(elseif|else|endif))");
Matcher matcher = regexp.matcher(template);
while (matcher.find())
    System.out.println(matcher.group());

将产生

#elseif ( $variable2 )Some sample text after 1st ElseIf.
#elseif($variable2)This text can be repeated many times until do while is called. SECOND ELSEIF
#elseif ( $variable2 )SECOND Some sample text after 1st ElseIf.
#elseif($variable2)SECOND This text can be repeated many times until do while is called. SECOND ELSEIF

秘密在于positive lookahead (?=#(elseif|else|endif)),因此#elseif#else#endif将匹配,但字符不会消耗。通过这种方式,可以在下一次迭代中找到它们。

答案 1 :(得分:1)

#elseif\b(?:(?!#else\b|#endif\b).)*

将匹配从块中的第一个#elseif到最近的#else#endif(但不包括)的所有内容。

Pattern regex = Pattern.compile("#elseif\\b(?:(?!#else\\b|#endif\\b).)*", Pattern.DOTALL);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    // matched text: regexMatcher.group()
    // match start: regexMatcher.start()
    // match end: regexMatcher.end()
} 

如果您需要从该匹配中提取单个'#elseif`块,请使用

#elseif\b(?:(?!#elseif\b).)*

上面第一个正则表达式匹配的结果。在Java中:

Pattern regex = Pattern.compile("#elseif\\b(?:(?!#elseif\\b).)*", Pattern.DOTALL);

答案 2 :(得分:1)

这里最大的问题是你需要#elseif(..)作为正则表达式中的开始和停止标记。第一个匹配是子字符串

#elseif ( $variable2 )Some sample text after 1st ElseIf.#elseif($variable2)

然后它开始在该序列之后寻找下一个匹配。因此,它会错过第一个#elseif表达式中的第二个#if,因为#elseif($variable2)序列已经是上一个匹配的一部分。

我会尝试在模式"\\#elseif\\s*\\((.*?)\\)"上分割字符串:

String[] temp = template.split("\\#elseif\\s*\\((.*?)\\)");

现在,从temp[1]开始的所有临时条目的开头都有#elseif块。 (?:#else|#endif)上的另一个分割应该给你的字符串只包含纯文本:

for (String s:temp)
  System.out.println(s.split("(?:#else|#endif)")[0]);

(无法测试第二次拆分,如果不起作用,只将其视为对策略的建议;)

答案 3 :(得分:1)

private static final Pattern REGEX = Pattern.compile(
    "#elseif\\s*\\(([^()]*)\\)(.*?)(?=#elseif|#else|#endif)");

public static void main(String[] args) {
    Matcher matcher = REGEX.matcher(template);
    while (matcher.find()) {
        System.out.println(matcher.group(2));
    }
}