CSV文件:对于双引号内的值,请用逗号替换逗号并删除双引号

时间:2015-12-12 01:33:36

标签: regex perl csv awk sed

我有一个格式为的csv文件:

value1, value2, "some text in the, quotes, with commas and "nested quotes", some more text", value3, value4

我想用&#39 ;;'替换第三个字段最外面引号中的逗号。并删除内部引号。我尝试过使用" sed"但没有任何东西有助于取代嵌套的引号。

2 个答案:

答案 0 :(得分:3)

你需要一个递归的正则表达式来匹配嵌套的引号,而改变引号和逗号的最简单方法是一个表达式替换,与非破坏性音译一致,这个音译在v5.14中可用。 Perl的

喜欢这个

use strict;
use warnings 'all';
use v5.14;

my $str = 'value1, value2, "some text in the, quotes, with commas and "nested quotes", some more text", value3, value4';

$str =~ s{ " ( (?: [^"]++ | (?R) )* ) " }{ $1 =~ tr/,"/;/dr }egx;

print $str, "\n";

输出

value1, value2, some text in the; quotes; with commas and nested quotes; some more text, value3, value4

答案 1 :(得分:2)

可以这样做。
标准是引用字段中包含的偶数引号 用逗号作为字段分隔符。

请注意,如果csv不符合上述标准,则不会保存任何内容,
它永远不会被解析。

(?:^|,)\s*\K"([^"]*(?:"[^"]*"[^"]*)+)"(?=\s*(?:,|$))

格式化:

 (?: ^ | , )
 \s* 
 \K 
 " 
 (                             # (1 start)
      [^"]* 
      (?:                           # Inner, even number of quotes

           "
           [^"]* 
           "
           [^"]* 
      )+
 )                             # (1 end)
 "    
 (?=
      \s* 
      (?: , | $ )
 )

Perl示例:

use strict;
use warnings;

my $data = 'value1, value2, "some text in the, quotes, with commas and "nested quotes", some more text", value3, value4';

sub innerRepl
{
    my ($in) = @_;
    return '"' . ($in =~ tr/,"/;/dr ) . '"';
}

$data =~ s/(?:^|,)\s*\K"([^"]*(?:"[^"]*"[^"]*)+)"(?=\s*(?:,|$))/ innerRepl( $1 ) /eg;

print $data;

输出:

value1, value2, "some text in the; quotes; with commas and nested quotes; some more text", value3, value4