如何拆分以逗号分隔的字符串,忽略双引号和圆括号内的逗号?

时间:2017-01-27 16:42:38

标签: perl

我想将逗号分隔的参数列表拆分为标记,但是如果在双引号或括号内,我想忽略分隔符。例如:

my @arr = some_function('one, "string with ,", func(a,func2(1,2))');

应该产生:

$arr[0] -> one
$arr[1] -> "string with ,"
$arr[2] -> func(a,func2(1,2))

我知道我可以忽略Text::ParseWords引号内的逗号,但仍会将func(a,func2(1,2))分成多个字段,因为它没有引用。有没有一种干净的方法可以做到这一点,还是我必须编写自己的解析器?

1 个答案:

答案 0 :(得分:3)

您可以使用Parse::RecDescent执行此操作,这可以让您定义用于解析的语法:

use strict;
use warnings 'all';
use 5.010;

use Data::Dumper;
use Parse::RecDescent;
use Regexp::Common qw(balanced);

my $grammar = q{
    # One or more fields, separated by commas
    startrule     : field(s /,/)                                                        # / for broken Stack Overflow syntax highlighter

    # A field can be a function call, a double-quoted string, or bare text
    field         : func
                  | quoted
                  | bare

    # A double-quoted string. Returned with quotes stripped
    quoted        : /"[^"]*"/
                  {
                      $item[-1] =~ s/\A"|"\z//g;                                        # / for broken Stack Overflow syntax highlighter
                      $return = $item[-1]
                  }

    # "Bare" text: not a function call and not a quoted string. May contain
    # spaces
    bare          : /[^,]*/

    # A function name
    identifier    : /\w+/
};

$grammar .= qq{
    # A function call
    func          : identifier /$RE{balanced}{-parens=>'()'}/
};

$grammar .= q{
                  { $return = join '', @item[1..$#item] }
};

my $parser = Parse::RecDescent->new($grammar) or die 'Bad grammar';

my $parsed = $parser->startrule(
    'one two, "string with ,", func(a,func2(1,2))'
);
print Dumper $parsed;

输出:

$VAR1 = [
          'one two',
          'string with ,',
          'func(a,func2(1,2))'
        ];

请注意,这不会处理包含转义引号的带引号的字段,但如果您知道哪个字符用于转义,则很容易添加。