如何使用Perl分割一行文本?

时间:2011-11-28 19:50:15

标签: perl

  

可能重复:
  join lines after colon (perl)

可能会有下一行:

red: alpha green: beta, gamma blue: alpha, beta yellow: alpha (gamma) beta (alpha) gamma (beta)

此行可以包含更多字符,例如以下任何字符:

xxx: yyyy
xxx: yyyy, zzzz
xxx: yyyy (zzz) yyyyy (xx)

我想根据以下标准拆分此行:

“黄色:alpha(gamma)beta(alpha)gamma(beta)”的输入部分被分配为“黄色:alpha(gamma)”,“黄色:beta(alpha)”,“黄色: gamma(beta)“。

找到“后跟冒号的单词”并将其添加为新行的缩进,如果“单词后跟冒号”后跟一个不包含acolon的单词,则生成一行,如果“单词后跟冒号”,则生成两行然后是两个(可能是逗号分隔的)不包含冒号的单词。如果“单词后跟冒号”之后的第二个单词用括号括起来,则带括号的信息属于前面带有单词的行。

示例1:

线

aa: bb ccc

分割

aa: bb
aa: ccc

示例2:

线

aa: bb, ccc ddd: aa eee ff

分割

aa: bb
aa: ccc
ddd: aa
ddd: eee
ddd: aa

原始

对于原始示例输入,输出应为:

red: alpha
green: beta
green: gamma
blue: alpha
blue: beta
yellow: alpha (gamma)
yellow: beta (alpha)
yellow: gamma (beta)

4 个答案:

答案 0 :(得分:0)

use strict;
use warnings;
my $line = 'red: alpha green: beta, gamma blue: alpha, beta yellow: alpha (gamma) beta (alpha) gamma (beta)';
my @tmp = split /\s*?(\w+):\s*/, $line;
shift @tmp;
while (my ($color, $value) = splice @tmp, 0, 2) {
    foreach my $v (split /, | (?!\()/, $value) {
        print "$color: $v\n";
    }
}

答案 1 :(得分:0)

#!/usr/bin/env perl
use strict;
use warnings;
my @toplevels;
while (<DATA>) {
    chomp;
    @toplevels = split /(?=\w+:)/;
}
for my $chunk (@toplevels) {
    my ($color, $line) = ( $chunk =~ /(^\w+:)(.+)/ );
    my @line = split /[,)]/, $line;
    for (@line) {
        printf "%s%s%s\n", $color, $_, m/\(/ ? ')' : '';
    }
}
__DATA__
red: alpha green: beta, gamma blue: alpha, beta yellow: alpha (gamma) beta (alpha) gamma (beta)

答案 2 :(得分:0)

use strict;
use warnings;
use v5.10;

while (<DATA>) {
    for my $unit (/[a-z]+:\s*[a-z, ()]+\s+(?=[a-z]+:)?/g) {
        if ($unit =~ /^([a-z]+:)\s*(.+)$/) {
            my $key = $1;
            my @val = split /[, ]+(?!\()/, $2;
            say "$key $_" for @val;
        }
    }
}

__DATA__
red: alpha green: beta, gamma blue: alpha, beta yellow: alpha (gamma) beta (alpha) gamma (beta)

答案 3 :(得分:0)

你可以这样做:

#!/usr/bin/perl -w

use strict;
use warnings;

my $string = "red: alpha green: beta, gamma blue: alpha, beta yellow: alpha (gamma) beta (alpha) gamma (beta)";

for my $key_values (split/(?=\w+:)/, $string) {
    my ($key, $values) = split/: /, $key_values;
    for my $value (split/, |(?<=\)) | (?!\()/, $values) {
        print "$key: $value\n";
    }
}

高尔夫版本:

map{s/(.+: )//;map{print"$1$_\n"}split/, |(?<=\)) | (?!\()/}split/(?=\w+:)/,$string;

编辑:我忽略了其中一项“要求”,因此我不得不更新第三个正则表达式。

相关问题