Question

我的搜索文本如下。

...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...

它包含许多行（实际上是一个javascript文件），但需要解析变量 strings 中的值，即aaa，bbb，ccc，ddd，eee

以下是Perl代码，或者在底部使用PHP

my $str = <<STR;
    ...
    ...
    var strings = ["aaa","bbb","ccc","ddd","eee"];
    ...
    ...
STR

my @matches = $str =~ /(?:\"(.+?)\",?)/g;
print "@matches";

我知道上面的脚本会匹配所有时刻，但它也会解析其他行中的字符串（“xyz”）。所以我需要检查字符串 var strings =

/var strings = \[(?:\"(.+?)\",?)/g

使用上面的正则表达式，它将解析 aaa 。

/var strings = \[(?:\"(.+?)\",?)(?:\"(.+?)\",?)/g

使用上面的内容，将获得 aaa 和 bbb 。所以为了避免正则表达式的重复，我使用了'+'量词，如下所示。

/var strings = \[(?:\"(.+?)\",?)+/g

但我只得到了 eee ，所以我的问题是为什么我只使用'+'量词来获得 eee

更新1：使用PHP preg_match_all（这样做是为了获得更多关注:-)）

$str = <<<STR
    ...
    ...
    var strings = ["aaa","bbb","ccc","ddd","eee"];
    ...
    ...
STR;

preg_match_all("/var strings = \[(?:\"(.+?)\",?)+/",$str,$matches);
print_r($matches);

更新2：为什么它匹配 eee ？因为(?:\"(.+?)\",?)+的贪婪。删除贪婪/var strings = \[(?:\"(.+?)\",?)+?/ aaa 将匹配。 但为什么只有一个结果呢？使用单个正则表达式有什么办法可以实现吗？

Answer 1

这是一个单正则表达式解决方案：

/(?:\bvar\s+strings\s*=\s*\[|\G,)\s*"([^"]*)"/g

\G是一个零宽度断言，匹配前一个匹配结束的位置（如果是第一次匹配尝试，则匹配字符串的开头）。所以这就像：

var\s+strings\s*=\s*[\s*"([^"]*)"

......在第一次尝试时，然后：

,\s*"([^"]*)"

......之后，但每场比赛必须从最后一场比赛开始。

这是一个demo in PHP，但它也适用于Perl。

Answer 2

您可能更喜欢这种首先使用var strings = [修饰符查找字符串/g的解决方案。这会将\G设置为在[之后立即匹配下一个正则表达式，该正则表达式会查找紧跟在双引号字符串后面的所有字符串，可能前面有逗号或空格。

my @matches;

if ($str =~ /var \s+ strings \s* = \s* \[ /gx) {
  @matches = $str =~ /\G [,\s]* "([^"]+)" /gx;
}

尽管使用/g修饰符，但您的正则表达式/var strings = \[(?:\"(.+?)\",?)+/g仅匹配一次，因为没有第二次出现var strings = [。每个匹配在匹配完成时返回捕获变量$1，$2，$3等的值列表，并/(?:"(.+?)",?)+/（无需逃避双引号）将多个值捕获到$1中，只留下最终值。您需要编写类似上面的内容，每次匹配只会将$1中的一个值捕获。{/ p>

Answer 3

因为+告诉它重复括号(?:"(.+?)",?)内的确切内容一次或多次。因此它将匹配"eee"字符串，然后查找重复的"eee"字符串，它找不到。

use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/var strings = \[(?:"(.+?)",?)+/)->explain();

The regular expression:

(?-imsx:var strings = \[(?:"(.+?)",?)+)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  var strings =            'var strings = '
----------------------------------------------------------------------
  \[                       '['
----------------------------------------------------------------------
  (?:                      group, but do not capture (1 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    (                        group and capture to \1:
----------------------------------------------------------------------
      .+?                      any character except \n (1 or more
                               times (matching the least amount
                               possible))
----------------------------------------------------------------------
    )                        end of \1
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    ,?                       ',' (optional (matching the most amount
                             possible))
----------------------------------------------------------------------
  )+                       end of grouping
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

一个更简单的例子是：

my @m = ('abcd' =~ m/(\w)+/g);
print "@m";

仅打印d。这是由于：

use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/(\w)+/)->explain();

The regular expression:

(?-imsx:(\w)+)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1 (1 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    \w                       word characters (a-z, A-Z, 0-9, _)
----------------------------------------------------------------------
  )+                       end of \1 (NOTE: because you are using a
                           quantifier on this capture, only the LAST
                           repetition of the captured pattern will be
                           stored in \1)
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

如果您在捕获组上使用量词，则仅使用最后一个实例。

这是一种有效的方式：

my $str = <<STR;
    ...
    ...
    var strings = ["aaa","bbb","ccc","ddd","eee"];
    ...
    ...
STR

my @matches;
$str =~ m/var strings = \[(.+?)\]/; # get the array first
my $jsarray = $1;
@matches = $array =~ m/"(.+?)"/g; # and get the strings from that

print "@matches";

<强>更新：单行解决方案（虽然不是单一的正则表达式）将是：

@matches = ($str =~ m/var strings = \[(.+?)\]/)[0] =~ m/"(.+?)"/g;

但这是非常难以理解的imho。

匹配所有出现的字符串

3 个答案: