Question

我试图编写一个程序，可以用任何所需的翻译替换文档中的圣经经文。这对包含大量KJV引用经文的旧书有用。该过程中最困难的部分是提出一种在文档中提取经文的方法。

我发现在文本中放置圣经经文的大多数书都使用"N"(BookName chapter#:verse#s)这样的结构，其中N是经文，引文是字面的，而且也是字面的。我一直遇到问题，想出一个正则表达式来匹配这些文本。

我尝试使用的最新正则表达式为：\"(.+)\"\s*\(([\w. ]+[0-9\s]+[:][\s0-9\-]+.*)\)。我在找不到所有比赛时遇到麻烦。

这是带有样本的regex101。 https://regex101.com/r/eS5oT8/1

无论如何使用正则表达式来解决这个问题？任何帮助或建议将不胜感激。

Answer 1

使用“g”修饰符。

g修饰符：全局。所有比赛（首场比赛时不返回）

请参阅Regex Demo

Answer 2

值得一提的是，您用来测试它的网站依赖于Javascript正则表达式，它需要明确定义g修饰符，这与C＃（默认为全局）不同。 / p>

您可以稍微调整您的表达式，并确保正确地转义双引号：

// Updated expression with escaped double-quotes and other minor changes
var regex = new Regex(@"\""([^""]+)\""\s*\(([\w. ]+[\d\s]+[:][\s\d\-]+[^)]*)\)");

然后使用Regex.Matches()方法查找字符串中的所有匹配项：

// Find each of the matches and output them
foreach(Match m in regex.Matches(input))
{
     // Output each match here (using Console Example)
     Console.WriteLine(m.Value);
}

您可以在this working example中查看其中的示例输出，如下所示：

Answer 3

从这个开始作为指南如何：

(?<quote>"".+"")          # a series of any characters in quotes 
\s +                      # followed by spaces
\(                        # followed by a parenthetical expression
   (?<book>\d*[a-z.\s] *) # book name (a-z, . or space) optionally preceded by digits. e.g. '1 Cor.'
   (?<chapter>\d+)        # chapter e.g. the '1' in 1:2
   :                      # semicolon
   (?<verse>\d+)          # verse e.g. the '2' in 1:2
\)

使用选项：

RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline | RegexOptions.IgnoreCase

上面的表达式将通过查看，例如{{1 }}

完整代码：

match.Groups["verse"]

Answer 4

您可以尝试使用MSDN中给出的示例链接

https://msdn.microsoft.com/en-us/library/0z2heewz(v=vs.110).aspx

使用System; 使用System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = "ablaze beagle choral dozen elementary fanatic " +
                     "glaze hunger inept jazz kitchen lemon minus " +
                     "night optical pizza quiz restoration stamina " +
                     "train unrest vertical whiz xray yellow zealous";
      string pattern = @"\b\w*z+\w*\b";
      Match m = Regex.Match(input, pattern);
      while (m.Success) {
         Console.WriteLine("'{0}' found at position {1}", m.Value, m.Index);
         m = m.NextMatch();
      }   
   }
}
// The example displays the following output:
//    'ablaze' found at position 0
//    'dozen' found at position 21
//    'glaze' found at position 46
//    'jazz' found at position 65
//    'pizza' found at position 104
//    'quiz' found at position 110
//    'whiz' found at position 157
//    'zealous' found at position 174

Answer 5

在你添加了“g”后，如果有多个经文之间没有任何'\n'字符，也要小心，因为"(.*)"会将它们视为一个长匹配而不是多个经文。你会想要"([^"]*)"这样的东西来阻止它。

用于在文本中查找特定模式的C＃正则表达式

5 个答案: