正则表达式中的无限循环

时间:2014-01-06 11:24:30

标签: java regex

我的程序因为正则表达式中的无限循环而导致字符串挂起。

我的字符串值是

String input="The contents of this Office Memorandum may also be brought to the notice";

我的代码是

String pat = "(the|The)[ ]?((([A-Za-z0-9])+([ ]| , )?([A-Za-z0-9]))+ [ ]?)+([<][ ]?[A-Za-z0-9- ]+ [,]?[A-Za-z0-9- ]+[ ]?[>])*[ ]?(Act|Rules|Rule|Bill|Regulations)[ ]?[,]?[ ]?([0-9]+)?";
Pattern p = Pattern.compile(pat);
Matcher m = p.matcher(input);

while (m.find()) {
      String sMatch = m.group();
      String temp1=sMatch;
      String temp2=sMatch;       
}

它正在进入while循环但不会出现。我认为正则表达式存在一些问题。我没有得到它 - 问题是什么? 可以为我解决任何问题。 当调试器进入while(m.find())时,它不会出现在外面。

4 个答案:

答案 0 :(得分:4)

你有一个灾难性回溯的情况,当发现匹配 时会发生这种情况,但是为了证明那里的长度,必须在每个长度组合中检查两个可变长度的非匹配替代术语。不配。在数学上,组合的数量随着输入的长度呈指数增长。

这是你的正则表达式的问题部分:

(([A-Za-z0-9])+([ ]| , )?([A-Za-z0-9]))+

它是一个或多个字母/数字,可选空格,然后是字母/数字,共有一个或多个。当输入长度很大时,您可以看到每个“一个或多个”的数量组合很大。

答案 1 :(得分:3)

根据这个:Endless Loop matcher.find()你的正则表达式可能过于复杂,导致“灾难性的回溯”。您可能必须将正则表达式简化为更简单的正则表达式。

答案 2 :(得分:1)

以下是一个可能有用的java示例(您应该能够根据自己的需要进行调整):

import java.util.regex.Pattern;
import java.util.regex.Matcher;

class RegexTest
{
  public static void main(String[] args)
  {
    String input="The Persons with Disabilities (Equal Opportunities, Protection of Rights and Full Participation) Act, 1995 states that everone must recognise the Corn Act and the Hoopety Fling Regulations 2006 or face death by oranges.";
    String pat = "(?i)(the\\b)(?> +)((?>(?>(?>\\w|\\)|\\()+(?:,| )*))+?)\\b(Act|Rules|Rule|Bill|Regulations)\\b(?: |,)*(?>(\\d{4})(?:\\D|$))?";
    Pattern p = Pattern.compile(pat);
    Matcher m = p.matcher(input);
    while (m.find()) {
      System.out.println(m.group(0).trim().replace(' ', '-'));
    }
  }
}

// Will Output: 
// The-Persons-with-Disabilities-(Equal-Opportunities,-Protection-of-Rights-and-Full-Participation)-Act,-1995
// the-Corn-Act
// the-Hoopety-Fling-Regulations-2006

正如其他人所说,这种表达遭受了灾难性的回溯 more info on catastrophic backtracking

我认为表达式可以满足您的需求:

(?i)(the\b)(?> +)((?>(?>\w+(?:,| )*))+?)\b(Act|Rules|Rule|Bill|Regulations)\b(?: |,)*(?>(\d{4})(?:\D|$))?

注意使用原子组(?>),这将有效地“忘记”回溯的可能性。如果您有任何问题,请告诉我。

更新以包括括号:

(?i)(the\b)(?> +)((?>(?>(?>\w|\)|\()+(?:,| )*))+?)\b(Act|Rules|Rule|Bill|Regulations)\b(?: |,)*(?>(\d{4})(?:\D|$))?

示例字符串:

'The Persons with Disabilities (Equal Opportunities, Protection of Rights and Full Participation) Act, 1995 states that everone must recognise the Corn Act and the Hoopety Fling Regulations 2006 or face death by oranges.'

会产生结果:

[Match number 1]
Matched: 'The Persons with Disabilities (Equal Opportunities, Protection of Rights and Full Participation) Act, 1995 ' at character 1
[Capture Group 1] 'The' found at character 1
[Capture Group 2] 'Persons with Disabilities (Equal Opportunities, Protection of Rights and Full Participation) ' found at character 5
[Capture Group 3] 'Act' found at character 98
[Capture Group 4] '1995' found at character 103

[Match number 2]
Matched: 'the Corn Act ' at character 143
[Capture Group 1] 'the' found at character 143
[Capture Group 2] 'Corn ' found at character 147
[Capture Group 3] 'Act' found at character 152
[Capture Group 4] '' found at character 1

[Match number 3]
Matched: 'the Hoopety Fling Regulations 2006 ' at character 160
[Capture Group 1] 'the' found at character 160
[Capture Group 2] 'Hoopety Fling ' found at character 164
[Capture Group 3] 'Regulations' found at character 178
[Capture Group 4] '2006' found at character 190

答案 3 :(得分:1)

正如其他人所说,这种表达遭受了灾难性的回溯。 但您可以使用以下代码来解决您的问题,即使它有点难看。

String input = "The contents of this Office Memorandum may also be brought to the notice Act";

    String pat = "((the|The) ?((\\w|\\d)+[  ,]?(\\w|\\d)+ ?)+([<] ?[A-Za-z0-9 ]+ ,?[A-Za-z0-9 ]+[ ]?[>])* ?)";
    Pattern p = Pattern.compile(pat);
    Matcher m = p.matcher(input);

    while (m.find()) {
        int end = m.end();
        String sMatch = m.group();
        System.out.println(sMatch);
        if (end < input.length()) {
            String next = input.substring(end);
            String pat1 = "(Act|Rules|Rule|Bill|Regulations)[ ]?[,]?[ ]?([0-9]+)?";
            Pattern p1 = Pattern.compile(pat1);
            Matcher m1 = p1.matcher(next);
            if (m1.find()) {
                System.out.println("found");
            }

        }
    }
}