正则表达式除了一些之外的任何字符

时间:2016-10-18 20:13:48

标签: c# regex

我试图创建一个正则表达式来捕捉[[xyz | asd]],但不是[[xyz]] 在文中:

'''Diversity Day'''" is the second episode of the [[The Office (U.S. season 1)]|first season]] of the American [[comedy]] [[television program|television series]] ''[[The Office (U.S. TV series)|The Office]]'', and the show's second episode overall. Written by [[B. J. Novak]] and directed by [[Ken Kwapis]], it first aired in the United States on March 29, 2005, on [[NBC]]. The episode guest stars ''Office'' consulting producer [[Larry Wilmore]] as [[List_of_characters_from_The_Office_(US)#Mr._Brown|Mr. Brown]].

应捕获以下结果:

[[The Office (U.S. season 1)]|first season]] <-- keep in mind of the "]" before "|", "]" in that case is a literal character not a breaking one "]]"
[[television program|television series]]
[[The Office (U.S. TV series)|The Office]]
[[List_of_characters_from_The_Office_(US)#Mr._Brown|Mr. Brown]]

我试图使用的是:

\[\[([^|]+)\|([^|]+)\]\]

但我无法弄清楚如何忽略&#34; |&#34;和&#34;]]&#34;在小组中。 [^ |(]])]不会工作,因为它不匹配&#34;]]&#34;但只有角色&#34;]&#34; (它需要是整个词)

请帮助,谢谢!

1 个答案:

答案 0 :(得分:6)

您可以在此处依赖tempered greedy token

\[\[((?:(?!]]).)*)\|((?:(?!]]).)*)]]

请参阅regex demo

<强>详情:

  • \[\[ - 2 [个符号
  • ((?:(?!]]).)*) - 第1组(注意*可以变成懒惰*?,特别是如果第一部分比第二部分短,则捕获:
    • (?:(?!]]).)* - 零个或多个序列
      • . - 任何字符(但是换行符,如果您的字符串跨越多行,请使用带RegexOptions.Singleline的模式)...
      • (?!]]) - 尚未启动]]序列(即,.]不匹配,而]跟随另一个\| <) / LI>
  • | - 文字((?:(?!]]).)*)
  • ]] - 第2组捕获与第2组相同的子模式
  • ] - 2个文字\[\[([^]|]*(?:](?!])[^]|]*)*)\|([^]]*(?:](?!])[^]]*)*)]]

这个正则表达式的效率更高的“展开”版本是:

|

请参阅regex demo。此正则表达式将第一个{{1}}视为内部字段分隔符。请参阅my other answer,了解如何展开驯服的贪婪代币。

enter image description here