Java正则表达式查找子字符串

时间:2015-08-07 08:42:36

标签: java regex substring matcher

我试图在Java中找到一个字符串中的特定单词。我开发了一个函数,旨在返回找到的字符串。这就是我现在所想的:

public static String getValueByregexExpr (String str, String regexExpr) {
    Pattern regex = Pattern.compile (regexExpr, Pattern.DOTALL);
    Matcher matcher1 = regex.matcher (str);
    if (matcher1.find ()) {
        if (matcher1.groupCount () != 0 && matcher1.group (1) != null) {
            for (int i = 0; i <= matcher1.groupCount (); i++) {
                System.out.println ("matcher " + i + " for regex " + regexExpr + "= " + matcher1.group (i));
            }
            return matcher1.group (1);
        }
        return regexExpr;
    }
    return null;
}

我的问题如下,我希望找到一个正则表达式,能够用I&#39; m来寻找组(1)。但是现在这段代码:

public static void main (String[] args) {

    String str = "HELLO_WORLD_123456 TEst";

    System.out.println ("First test");
    String regex1 = ".*WORLD.*";
    String matchedString = Util.getValueByregexExpr (str, regex1);
    //Here, I want to obtain matchedString = WORLD
    if (matchedString == null) {
        System.out.println ("matchedString null");
    } else if (matchedString.equals (regex1)) {
        System.out.println ("String found but empty group(1)");
    } else {
        System.out.println ("Result : " + matchedString);
    }

    //Here, I want to obtain matchedString = WORLD_123456
    System.out.println ("\nSecond test");
    String regex2 = "WORLD_([^_]+)";
    matchedString = Util.getValueByregexExpr (str, regex2);
    if (matchedString == null) {
        System.out.println ("regex " + regex2 + " matchedString null");
    } else if (matchedString == regex2) {
        System.out.println ("regex " + regex2 + " String found but empty group(1)");
    } else {
        System.out.println ("regex " + regex2 + " Result : " + matchedString);
    }

}

给我输出:

First test:
regex .*WORLD.* String found but empty group(1)

Second test:
matcher 0 for regex WORLD_([^_]+)= WORLD_123456
matcher 1 for regex WORLD_([^_]+)= 123456
regex WORLD_([^_]+) Result : 123456

首先,是否有任何正则表达式可以返回:   - 第一次测试的世界   - 第二次测试的WORLD_123456

其次,我认为只要你只有一个结果,每个结果都会被设置到组(1)中。但考虑到测试2的结果,我显然是错的。有人能给我更多关于它的信息吗?

感谢您的帮助。

2 个答案:

答案 0 :(得分:1)

要修复第一个,只需添加捕获组:

>>> l = "first portion|middle|,end".split('|')
>>> l[0]+l[-1]
'first portion,end'
>>> l[1]
'middle'

要修复第二个,请在字符类中添加空格:

String regex1 = ".*(WORLD).*";

请参阅demo

您的第一部分代码未按预期工作的主要原因是String regex2 = "(WORLD_[^_\\s]+)"; 正在检查的缺少捕获组。 第二个返回了用getValueByregexExpr正则表达式部分捕获的搅拌部分。

答案 1 :(得分:0)

在正则表达式中,()内的所有内容都成为一个群组。

纠正你regex

String regex1 = ".*(WORLD).*";


String regex2 = "(WORLD_[^_\\s]+)";