Perl正则表达式替换不使用全局修饰符

时间:2012-11-04 16:24:40

标签: regex perl cygwin

我的代码如下所示:

s/(["\'])(?:\\?+.)*?\1/(my $x = $&) =~ s|^(["\'])(.*src=)([\'"])\/|$1$2$3$1.\\$baseUrl.$1\/|g;$x/ge

忽略最后一位(并且只留下出现问题的部分)代码变为:

s/(["\'])(?:\\?+.)*?\1/replace-text-here/g

我尝试过使用两者,但我仍然遇到同样的问题,即使我使用g修饰符,这个正则表达式只匹配并替换第一次出现。如果这是一个Perl bug,我不知道,但我使用的是一个匹配两个引号之间的所有内容的正则表达式,并且还处理转义引号,我正在关注this blog post。在我看来,正则表达式应匹配两个引号之间的所有内容,然后替换它,然后尝试找到此模式的另一个实例,因为g修饰符。

对于一些背景信息,我没有使用和版本声明,并且打开了严格和警告,但没有出现任何警告。我的脚本将整个文件读入标量(包括换行符),然后正则表达式直接在该标量上运行。它似乎确实在每条线上都有效 - 在一条线上不会多次。 Perl版本5.14.2,在Cygwin 64位上运行。可能是Cygwin(或Perl端口)搞砸了什么,但我对此表示怀疑。

我还尝试了另一个博客文章的例子,原子组和占有量词替换为等效代码,但没有这些功能,但这个问题仍然困扰着我。

示例:

<?php echo ($watched_dir->getExistsFlag())?"":"<span class='ui-icon-alert'><img src='/css/images/warning-icon.png'></span>"?>
Should become (with the shortened regex):
<?php echo ($watched_dir->getExistsFlag())?replace-text-here:replace-text-here?>
Yet it only becomes:
<?php echo ($watched_dir->getExistsFlag())?replace-text-here:"<span class='ui-icon-alert'><img src='/css/images/warning-icon.png'></span>"?>

<?php echo ($sub->getTarget() != "")?"target=\"".$sub->getTarget()."\"":""; ?>
Should become:
<?php echo ($sub->getTarget() != replace-text-here)?replace-text-here.$sub->getTarget().replace-text-here:replace-text-here; ?>
And as above, only the first occurrence is changed.

(是的,我确实意识到这会引发某种形式 - 不要使用正则表达式来解析HTML / PHP。但在这种情况下我认为正则表达式更合适,因为我不寻找上下文,我正在寻找一个字符串(引号内的任何内容)并对该字符串执行操作 - 这是正则表达式。)

只是一个注释 - 这些正则表达式在eval函数中运行,实际的正则表达式用单引号字符串编码(这就是单引号被转义的原因)。我将直接尝试任何提出的解决方案,以排除我糟糕的编程。

编辑:根据要求,提供问题的简短脚本:

#!/usr/bin/perl -w

use strict;

my $data = "this is the first line, where nothing much happens
but on the second line \"we suddenly have some double quotes\"
and on the third line there are 'single quotes'
but the fourth line has \"double quotes\" AND 'single quotes', but also another \"double quote\"
the fifth line has the interesting one - \"double quoted string 'with embedded singles' AND \\\"escaped doubles\\\"\"
and the sixth is just to say - we need a new line at the end to simulate a properly structured file
";
my $regex = 's/(["\'])(?:\\?+.)*?\1/replaced!/g';
my $regex2 = 's/([\'"]).*?\1/replaced2!/g';

print $data."\n";
$_ = $data; # to make the regex operate on $_, as per the original script
eval($regex);
print $_."\n";
$_ = $data;
eval($regex2);
print $_; # just an example of an eval, but without the fancy possessive quantifiers

这为我产生了以下输出:

this is the first line, where nothing much happens
but on the second line "we suddenly have some double quotes"
and on the third line there are 'single quotes'
but the fourth line has "double quotes" AND 'single quotes', but also another "double quote"
the fifth line has the interesting one - "double quoted string 'with embedded singles' AND \"escaped doubles\""
and the sixth is just to say - we need a new line at the end to simulate a properly structured file

this is the first line, where nothing much happens
but on the second line "we suddenly have some double quotes"
and on the third line there are 'single quotes'
but the fourth line has "double quotes" AND 'single quotes', but also another "double quote"
the fifth line has the interesting one - "double quoted string 'with embedded singles' AND \"escaped doubles\replaced!
and the sixth is just to say - we need a new line at the end to simulate a properly structured file

this is the first line, where nothing much happens
but on the second line replaced2!
and on the third line there are replaced2!
but the fourth line has replaced2! AND replaced2!, but also another replaced2!
the fifth line has the interesting one - replaced2!escaped doubles\replaced2!
and the sixth is just to say - we need a new line at the end to simulate a properly structured file

2 个答案:

答案 0 :(得分:1)

更新:这:

my $regex = 's/(["\'])(?:\\?+.)*?\1/replaced!/g';

应该是:

my $regex = 's/(["\'])(?:\\\\?+.)*?\1/replaced!/g';

因为作业中的单引号将\\变为\,您希望正则表达式以\\结尾。

请将您的问题归结为一个演示问题的简短脚本(包括输入,输出错误,评估等)。拿你做的东西展示并尝试它:

use strict;
use warnings;
my $input = <<'END';
<?php echo ($watched_dir->getExistsFlag())?"":"<span class='ui-icon-alert'><img src='/css/images/warning-icon.png'></span>"?>
END

(my $output = $input) =~ s/(["\'])(?:\\?+.)*?\1/replace-text-here/g;
print $input,"becomes\n",$output;

为我生产:

<?php echo ($watched_dir->getExistsFlag())?"":"<span class='ui-icon-alert'><img src='/css/images/warning-icon.png'></span>"?>
becomes
<?php echo ($watched_dir->getExistsFlag())?replace-text-here:replace-text-here?>
正如我所料,

它对你有什么用?

答案 1 :(得分:1)

即使在单引号内,\\也会被\处理,所以:

my $regex = 's/(["\'])(?:\\?+.)*?\1/replaced!/g';

$regex设置为:

s/(["'])(?:\?+.)*?\1/replaced!/g

要求引号字符串中的每个字符前面都有一个或多个文字问号(\?+)。由于您没有很多问号,这实际上意味着您要求字符串为空,""''

最小修复是添加更多反斜杠:

my $regex = 's/(["\'])(?:\\\\?+.)*?\\1/replaced!/g';

但你真的可能想重新考虑你的方法。您真的需要将整个regex-replacement命令保存为字符串并通过eval运行吗?