正则表达式嵌套括号(忽略括号内和空格)

时间:2017-10-16 05:10:51

标签: java regex parsing bibtex

我正在尝试创建一个正则表达式模式,该模式读取bibTex引文文件并匹配括号内的所有内容。对于那些不知道的人,bibtex引用如下所示:

@INPROCEEDINGS{Fogel95,
  AUTHOR =       {L. J. Fogel and P. J. Angeline and D. B. Fogel},
  TITLE =        {An evolutionary programming approach to self-adaptation
                    on finite state machines},
  BOOKTITLE =    {Proceedings of the Fourth International Conference on
                    Evolutionary Programming},
  YEAR =         {1995},
  pages =        {355--365}
}

@ARTICLE{Goldberg91,
  AUTHOR =       {D. Goldberg},
  TITLE =        {Real-coded genetic algorithms, virtual alphabets, and blocking},
  JOURNAL =      {Complex Systems},
  YEAR =         {1991},
  pages =        {139--167}
}

@INPROCEEDINGS{Yao96,
  AUTHOR =       {X. Yao and Y. Liu},
  TITLE =        {Fast evolutionary programming},
  BOOKTITLE =    {Proceedings of the 6$^{th}$ Annual Conference on Evolutionary
                    Programming},
  YEAR =         {1996},
  pages =        {451--460}
}

我目前的模式如下:

@(\\w+)\{(\\w+),\\s*((\\w+)\\s*=\\s*(\\"|\\{)?(.+)(\\"|\\})?,?\\s*)+\\}

此模式与第二个引文匹配,但仅与第一个和第三个引用相匹配。我知道它与第三次引用不匹配的原因是因为引文左侧的括号(6 $ ^ { th } $)而且我已经发现它不会引用在引文元素左侧有空格/换行的引文

BOOKTITLE =    {Proceedings of the Fourth International Conference on
                Evolutionary Programming},
//This part of the citation has a newline in the middle of it.

现在我一直在努力修复我的模式,但是我发现的正则表达式的事情是,我尝试修复表达式/为其添加新条件的时间越长,它就会越混乱。我只是想知道如何捕捉整个引文而不管内括号/括号。某些引文在" ="之后不包含括号/括号。总的来说。任何帮助,以及解释将不胜感激。我看过类似的例子,这些例子只会让我更加困惑,因为很难通过简单地浏览它来破译正则表达式。谢谢。

3 个答案:

答案 0 :(得分:0)

在花括号之间捕捉所有内容的最简单方法是:

\{([^}]+)}

否定[^}]包括所有字符,而不是大括号,包括换行符。

答案 1 :(得分:0)

正则表达式不适用于嵌套块的文本。

如果您坚持使用正则表达式,则应首先匹配外部部分:

???

捕获@(\w+)\{(\w+),([^{}]*(?:\{[^{}]*\}[^{}]*)*)\},以便在嵌套循环中匹配它。

外部正则表达式类似于(\w+)\s*=\s*\{([^}]*)\}

内部正则表达式类似于Pattern pTag = Pattern.compile("@(\\w+)" + // tag "\\{" + "(\\w+)" + // name "," + "([^{}]*(?:\\{[^{}]*\\}[^{}]*)*)" + // content "\\}"); Pattern pField = Pattern.compile("(\\w+)" + // field "\\s*=\\s*" + "\\{" + "([^}]*)" + // value "\\}"); Pattern pNewline = Pattern.compile("\\s*(?:\\R\\s*)+"); for (Matcher mTag = pTag.matcher(input); mTag.find(); ) { String tag = mTag.group(1); String name = mTag.group(2); String content = mTag.group(3); for (Matcher mField = pField.matcher(content); mField.find(); ) { String field = mField.group(1); String value = mField.group(2); value = pNewline.matcher(value).replaceAll(" "); System.out.printf("%-15s %-12s %-11s %s%n", tag, name, field, value); } }

由于字段值可能包含在多行上,因此需要将其解包。

代码

String input = "@INPROCEEDINGS{Fogel95,\n" +
               "  AUTHOR =       {L. J. Fogel and P. J. Angeline and D. B. Fogel},\n" +
               "  TITLE =        {An evolutionary programming approach to self-adaptation\n" +
               "                    on finite state machines},\n" +
               "  BOOKTITLE =    {Proceedings of the Fourth International Conference on\n" +
               "                    Evolutionary Programming},\n" +
               "  YEAR =         {1995},\n" +
               "  pages =        {355--365}\n" +
               "}\n" +
               "\n" +
               "@ARTICLE{Goldberg91,\n" +
               "  AUTHOR =       {D. Goldberg},\n" +
               "  TITLE =        {Real-coded genetic algorithms, virtual alphabets, and blocking},\n" +
               "  JOURNAL =      {Complex Systems},\n" +
               "  YEAR =         {1991},\n" +
               "  pages =        {139--167}\n" +
               "}\n" +
               "\n" +
               "@INPROCEEDINGS{Yao96,\n" +
               "  AUTHOR =       {X. Yao and Y. Liu},\n" +
               "  TITLE =        {Fast evolutionary programming},\n" +
               "  BOOKTITLE =    {Proceedings of the 6$^{th}$ Annual Conference on Evolutionary\n" +
               "                    Programming},\n" +
               "  YEAR =         {1996},\n" +
               "  pages =        {451--460}\n" +
               "}";

测试输入

INPROCEEDINGS   Fogel95      AUTHOR      L. J. Fogel and P. J. Angeline and D. B. Fogel
INPROCEEDINGS   Fogel95      TITLE       An evolutionary programming approach to self-adaptation on finite state machines
INPROCEEDINGS   Fogel95      BOOKTITLE   Proceedings of the Fourth International Conference on Evolutionary Programming
INPROCEEDINGS   Fogel95      YEAR        1995
INPROCEEDINGS   Fogel95      pages       355--365
ARTICLE         Goldberg91   AUTHOR      D. Goldberg
ARTICLE         Goldberg91   TITLE       Real-coded genetic algorithms, virtual alphabets, and blocking
ARTICLE         Goldberg91   JOURNAL     Complex Systems
ARTICLE         Goldberg91   YEAR        1991
ARTICLE         Goldberg91   pages       139--167

输出

    <input class="category-input" type="text" data-category-url="audits-and-reporting.php" name="category_name" value="" placeholder="Your Catgeory Name" size="40" /> 

答案 2 :(得分:0)

尽我所知,Andreas的解决方案可能更好,但是如果你想只是一个将整个字符串分成数组的正则表达式字符串,你可以使用:{ {1}}