解析逗号分隔线并计​​算总和

时间:2009-06-24 19:04:04

标签: perl parsing

所以基本上我的问题可以用伪代码编写如下:

split the line by =
using value before =, find the next line
check this the value after = matches previous
if not, then loop till end of file
collect all the values which match and using the line numbers, get the last 2 columns value
sum all the values for a given set with equal key=value pair.

我的数据集如下:

3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200, 100
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300, 10
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000, 80
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000, 1200
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000, 500
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000, 7

我需要做的是,取3的所有值,它们相等,得到最后2列的总和,并将其与该值相加。例如:

3 = 5002, sum = 500, 110
5 = 0, sum = 1300, 90
8 = 2, sum = 15000, 1700

我已经能够解析前3个,但我无法为其余列做任何事情: - (

5 个答案:

答案 0 :(得分:3)

根据我的理解,这里有两种可能的方法。第一个使用复合键在单级哈希中存储值。第二个使用多级哈希:

方法1:

#!/usr/bin/perl

use strict;
use warnings;

use List::Util qw( sum );

my %data;

while ( my $line = <DATA> ) {
    chomp $line;

    my @parts = split /, /, $line;
    last unless @parts;

    my $value = pop @parts;

    push @{ $data{$_} }, $value for @parts;
}

for my $col ( sort keys %data ) {
    printf("%12s:%9d\n", $col, sum @{ $data{$col} } );
}

__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000

C:\Temp> hj
  3=5001:    11000
  3=5002:      500
  3=5003:     6000
 0=10001:     1000
 0=10002:      500
 0=10004:    16000
 1=14001:    11000
 1=14002:      500
 1=14003:     6000
     4=0:     6000
     4=1:    11500
     5=0:     1300
     5=1:    10200
     5=2:     5000
     5=3:     1000
     6=3:    11000
     6=5:      500
     6=8:     6000
     7=0:     1300
     7=1:      200
     7=2:    16000
     8=0:     1300
     8=1:     1200
     8=2:    15000
     9=0:     2200
     9=1:    15300

方法:2

#!/usr/bin/perl

use strict;
use warnings;

use List::Util qw( sum );

my %data;

while ( my $line = <DATA> ) {
    chomp $line;

    my @parts = split /, /, $line;
    last unless @parts;

    my $value = $parts[-1];

    for ( my $i = 0 ; $i < @parts - 2; ++$i ) {
        my @subparts = split /=/, $parts[$i];
        push @{ $data{$subparts[0]}->{$subparts[1]} }, $value;
    }
}

for my $k1 ( keys %data ) {
    for my $k2 ( keys %{ $data{$k1} } ) {
        printf(
            "%2d:%6d:%9d \n",
            $k1, $k2, sum @{ $data{$k1}->{$k2} }
        );
    }
}

__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000

C:\Temp> hjk
 3:  5003:     6000
 3:  5002:      500
 3:  5001:    11000
 7:     1:      200
 7:     0:     1300
 7:     2:    16000
 9:     1:    15300
 9:     0:     2200
 8:     1:     1200
 8:     0:     1300
 8:     2:    15000
 4:     1:    11500
 4:     0:     6000
 1: 14001:    11000
 1: 14003:     6000
 1: 14002:      500
 0: 10001:     1000
 0: 10004:    16000
 0: 10002:      500
 5:     1:    10200
 5:     3:     1000
 5:     0:     1300
 5:     2:     5000

NB:添加sort品尝。

答案 1 :(得分:1)

如何拆分“,”。然后,您可以拉出最后一个元素并将其与列表中的每个元素配对。对于你的第一行,你最终会得到以下几对:

3=5002, 200
0=10002, 200
5=1, 200
4=1, 200
7=1, 200
8=1, 200
9=0, 200
1=14002, 200
6=5, 200

将这些对中的每一对添加到主列表中。一旦你得到它,你可以按对中的第一个元素进行排序并求和。

答案 2 :(得分:0)

您解释问题的方式不是很清楚。根据我的理解,这将是我的方法:

  • 创建一个二维数组,其中包含不同的逗号分隔字段,用于维护行,列结构。

  • 分析每一列并创建一个哈希,将每个数据值映射到包含它的行。

IE:对于第一列,你有一个哈希值 3 = 5002 0,1
3 = 5001 2,3
3 = 5003 4,5

  • 然后,您浏览哈希的每个条目,并将为不同数据列出的行的最后一个成员求和。

  • 对除最后一列之外的每一列重复。

答案 3 :(得分:0)

我希望这就是你要找的东西:

#!/usr/bin/perl

use strict;
use warnings;

use Text::CSV_XS;

my %data;
my $csv = Text::CSV_XS->new();
while ( <DATA> ) {
    $csv->parse($_);
    my @fields = $csv->fields();
    $fields[0] =~ s/^3=//;
    $data{ $fields[0] } += $fields[9];
}

use Data::Dumper;
print Dumper \%data;

__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000

答案 4 :(得分:0)

好吧,似乎每个人都在试图了解你真正想要的东西。我不明白,但似乎你只想捕获包含给定key = value对的所有行的总和。除此之外,您实际上并不关心密钥。

或类似的东西。

所以,我的问题是:你能为示例数据集提供预期的输出吗?

无论如何,这是我的尝试('#/'评论只是为了帮助语法高亮显示。)

#!/usr/bin/perl
use strict;
use warnings;
my %h;
my @ord_keys;
while (<DATA>) {
    chomp;
    my @cols = split /,\s*/; #/
    my $val = pop @cols;

    foreach my $k (@cols) {
        if (exists($h{$k})) {
            $h{$k} += $val;
        } else {
            push @ord_keys, $k;
            $h{$k} = $val;
        }
    }
}

foreach my $key (@ord_keys) {
    my ($k, $v) = split /=/, $key; #/
    print "$k = $v, sum = $h{$key}\n";
}

__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000

结果:

3 = 5002, sum = 500
0 = 10002, sum = 500
5 = 1, sum = 10200
4 = 1, sum = 11500
7 = 1, sum = 200
8 = 1, sum = 1200
9 = 0, sum = 2200
1 = 14002, sum = 500
6 = 5, sum = 500
5 = 0, sum = 1300
7 = 0, sum = 1300
8 = 0, sum = 1300
9 = 1, sum = 15300
3 = 5001, sum = 11000
0 = 10001, sum = 1000
4 = 0, sum = 6000
1 = 14001, sum = 11000
6 = 3, sum = 11000
0 = 10004, sum = 16000
7 = 2, sum = 16000
8 = 2, sum = 15000
3 = 5003, sum = 6000
5 = 2, sum = 5000
1 = 14003, sum = 6000
6 = 8, sum = 6000
5 = 3, sum = 1000

欢迎评论。