正则表达式:如何确定给定char之前的char出现的奇数/偶数?

时间:2009-07-24 18:44:09

标签: regex

我希望仅使用不带引号的术语将|替换为OR,例如:

"this | that" | "the | other" -> "this | that" OR "the | other"

是的,我可以拆分空间或引用,得到一个数组并迭代它,并重建字符串,但这似乎......不优雅。所以也许有一种正则表达式方法可以通过计算"之前的|来实现这一点,显然奇数意味着|被引用,甚至意味着不加引号。 (注意:如果至少有一个",则只有"偶数才会开始处理。

9 个答案:

答案 0 :(得分:11)

正则表达式无法计算,但是可以用于确定是否存在奇数或偶数。在这种情况下的技巧是检查管道之后的引号,而不是它之前。

str = str.replace(/\|(?=(?:(?:[^"]*"){2})*[^"]*$)/g, "OR");

打破这一点,(?:[^"]*"){2}匹配下一对引号(如果有的话)以及介入的非引号。在您尽可能多地完成该操作后(可能为零),[^"]*$会消耗任何剩余的非引号,直到字符串结束。

当然,这假设文本格式正确。它也没有解决转义引号的问题,但如果你需要它也可以。

答案 1 :(得分:5)

正则表达不计算在内。这就是解析器的用途。

答案 2 :(得分:4)

您可能会发现Perl FAQ on this issue相关。

#!/usr/bin/perl

use strict;
use warnings;

my $x = qq{"this | that" | "the | other"};
print join('" OR "', split /" \| "/, $x), "\n";

答案 3 :(得分:1)

您不需要计算,因为您不嵌套引号。这样做:

#!/usr/bin/perl

my $str = '" this \" | that" | "the | other" | "still | something | else"';
print "$str\n";

while($str =~ /^((?:[^"|\\]*|\\.|"(?:[^\\"]|\\.)*")*)\|/) {
        $str =~ s/^((?:[^"|\\]*|\\.|"(?:[^\\"]|\\.)*")*)\|/$1OR/;
}

print "$str\n";

现在,让我们解释一下这个表达。

^  -- means you'll always match everything from the beginning of the string, otherwise
      the match might start inside a quote, and break everything

(...)\|   -- this means you'll match a certain pattern, followed by a |, which appears
             escaped here; so when you replace it with $1OR, you keep everything, but
             replace the |.

(?:...)*  -- This is a non-matching group, which can be repeated multiple times; we
             use a group here so we can repeat multiple times alternative patterns.

[^"|\\]*  -- This is the first pattern. Anything that isn't a pipe, an escape character
             or a quote.

\\.       -- This is the second pattern. Basically, an escape character and anything
             that follows it.

"(?:...)*" -- This is the third pattern. Open quote, followed by a another
              non-matching group repeated multiple times, followed by a closing
              quote.

[^\\"]    -- This is the first pattern in the second non-matching group. It's anything
             except an escape character or a quote.

\\.       -- This is the second pattern in the second non-matching group. It's an
             escape character and whatever follows it.

结果如下:

" this \" | that" | "the | other" | "still | something | else"
" this \" | that" OR "the | other" OR "still | something | else"

答案 4 :(得分:1)

另一种方法(类似于Alan M的工作答案):

str = str.replace(/(".+?"|\w+)\s*\|\s*/g, '$1 OR ');

第一组内部的部分(为便于阅读而间隔):

".+?"  |  \w+

...基本上是指,引用的东西或一个词。其余的意思是它后跟一个“|”包装在可选的空格中。替换是第一部分(“$ 1”表示第一组)后跟“OR”。

答案 5 :(得分:0)

也许你正在寻找这样的东西:

(?<=^([^"]*"[^"]*")+[^"|]*)\|

答案 6 :(得分:0)

谢谢大家。忽略提及这一点的道歉是javascript,并且不必引用条款,并且可以有任意数量的引用/未引用的术语,例如:

"this | that" | "the | other" | yet | another  -> "this | that" OR "the | other" OR yet OR another 

丹尼尔,似乎是在球场,即基本上是匹配/按摩循环。谢谢你的详细解释。在js中,它看起来像一个split,一个术语数组上的forEach循环,将一个术语(在将一个术语改为OR之后)推回一个数组,然后重新连接。

答案 7 :(得分:0)

@Alan M,运行良好,由于sqlite FTS功能稀疏而无需转义。

@epost,为简洁和优雅所接受的解决方案,谢谢。它只需要以更通用的形式用于unicode等。

(".+?"|[^\"\s]+)\s*\|\s*

答案 8 :(得分:0)

我在C#中的解决方案来计算引号,然后使用正则表达式来获取匹配项:

        // Count the number of quotes.
        var quotesOnly = Regex.Replace(searchText, @"[^""]", string.Empty);
        var quoteCount = quotesOnly.Length;
        if (quoteCount > 0)
        {
            // If the quote count is an odd number there's a missing quote.
            // Assume a quote is missing from the end - executive decision.
            if (quoteCount%2 == 1)
            {
                searchText += @"""";
            }

            // Get the matching groups of strings. Exclude the quotes themselves.
            // e.g. The following line:
            // "this and that" or then and "this or other"
            // will result in the following groups:
            // 1. "this and that"
            // 2. "or"
            // 3. "then"
            // 4. "and"
            // 5. "this or other"
            var matches = Regex.Matches(searchText, @"([^\""]*)", RegexOptions.Singleline);
            var list = new List<string>();
            foreach (var match in matches.Cast<Match>())
            {
                var value = match.Groups[0].Value.Trim();
                if (!string.IsNullOrEmpty(value))
                {
                    list.Add(value);
                }
            }

            // TODO: Do something with the list of strings.
       }