Perl Regex删除引号之间的逗号?

时间:2015-06-30 21:11:27

标签: regex perl quotes

我正在尝试删除字符串中双引号之间的逗号,同时保留其他逗号完整无缺? (这是一个有时包含备用逗号的电子邮件地址)。以下"蛮力"代码在我的特定机器上运行正常,但有更优雅的方式来做,也许只有一个正则表达式? 邓肯

$string = '06/14/2015,19:13:51,"Mrs, Nkoli,,,ka N,ebedo,,m" <ubabankoffice93@gmail.com>,1,2';
print "Initial string = ", $string, "<br>\n";

# Extract stuff between the quotes
$string =~ /\"(.*?)\"/;

$name = $1;
print "name = ", $1, "<br>\n";
# Delete all commas between the quotes
$name =~ s/,//g;
print "name minus commas = ", $name, "<br>\n";
# Put the modified name back between the quotes
$string =~ s/\"(.*?)\"/\"$name\"/;
print "new string = ", $string, "<br>\n";

2 个答案:

答案 0 :(得分:2)

您可以使用这种模式:

$string =~ s/(?:\G(?!\A)|[^"]*")[^",]*\K(?:,|"(*SKIP)(*FAIL))//g;

模式细节:

(?: # two possible beginnings:
    \G(?!\A) # contiguous to the previous match
  |          # OR
    [^"]*"   # all characters until an opening quote
)
[^",]*     #"# all that is not a quote or a comma
\K           # discard all previous characters from the match result
(?:          # two possible cases:
    ,        # a comma is found, so it will be replaced
  |          # OR
    "(*SKIP)(*FAIL) #"# when the closing quote is reached, make the pattern fail
                      # and force the regex engine to not retry previous positions.
)

如果您使用较旧的perl版本,\K可能不支持回溯控制动词。在这种情况下,您可以将此模式与捕获组一起使用:

$string =~ s/((?:\G(?!\A)|[^"]*")[^",]*)(?:,|("[^"]*(?:"|\z)))/$1$2/g;

答案 1 :(得分:2)

一种方法是使用nice模块Text::ParseWords来隔离特定字段并执行简单的音译以删除逗号:

06/14/2015,19:13:51,"Mrs Nkolika Nebedom" <ubabankoffice93@gmail.com>,1,2

<强>输出:

id  = re.search('(-d)([0-9]+)',url).group(2)

我认为在您的电子邮件字段中没有逗号可以合法显示。否则需要一些其他替换方法。