在perl中解析一个大文件

时间:2018-03-27 20:49:27

标签: perl

我有一个超过1GB的大文件,我希望用它中的两个值解析它并创建一个数组引用的哈希值。

以下是该文件的示例:

ra_uuid: 592bbb0c-2c6b-11e8-8580-00e081ea0e98
cms_uuid: a4e6bffc-2c6a-11e8-a7cf-00e081ea0e8e
mpd_uuid: bf3fd34c-2c57-11e8-8bc5-00e081ea0e5c
amLeader: 0
numAssignments = 20909996
mpg=1 mrule=140 reg=7989 score=10625 rank=0 perc=100 mp_demand=40
mpg=2 mrule=140 reg=7989 score=10625 rank=0 perc=100 mp_demand=50
mpg=1 mrule=140 reg=7989 score=10625 rank=0 perc=100 mp_demand=100
mpg=2 mrule=140 reg=7989 score=10625 rank=0 perc=100 mp_demand=40
mpg=3 mrule=150 reg=7989 score=0 rank=0 perc=100 mp_demand=20
mpg=4 mrule=150 reg=7989 score=10625 rank=0 perc=100 mp_demand=40
mpg=3 mrule=150 reg=7989 score=0 rank=0 perc=100 mp_demand=20
mpg=4 mrule=150 reg=7989 score=10625 rank=0 perc=100 mp_demand=40

所以我希望将字段mrule的所有值作为哈希的键以及与数组引用中的mp_demand对应的所有值。

以下是我对上述样本的期望输出:

{
    '140' => [40,50,100,40],
    '150' => [20,40,20,40]
}

我的代码:

use strict;
use warnings;

use Data::Dumper qw( Dumper );

my @bigarray;
my %hash;
my $hash_ref;
my @column;
my $key;
my $value;

open(FILE, "<", "$RESULTS_FILE/$ASSIGNMENT_MESSAGE_OUTPUT") or die("Could not open $ASSIGNMENT_MESSAGE_OUTPUT to read");

while(my $data = <FILE>){
    map {s/=/ /g;} $data;
    @column = split(/\t/, $data);
    print("the column is ". Dumper(\@column));
    $key = $column[3];
    $value = $column[13];
    $hash{$key} = $value ;
}

$hash_ref = \%hash ;
push(@bigarray, $hash_ref);
print("the hash is ". Dumper($hash_ref));
print("the demand array is ". Dumper(\@bigarray));

它产生以下输出:

the column is $VAR1 = [
          'ra_uuid: 592bbb0c-2c6b-11e8-8580-00e081ea0e98
'
        ];
Use of uninitialized value $key in hash element at a.pl line 19, <FILE> line 1.
the column is $VAR1 = [
          'cms_uuid: a4e6bffc-2c6a-11e8-a7cf-00e081ea0e8e
'
        ];
Use of uninitialized value $key in hash element at a.pl line 19, <FILE> line 2.
the column is $VAR1 = [
          'mpd_uuid: bf3fd34c-2c57-11e8-8bc5-00e081ea0e5c
'
        ];
Use of uninitialized value $key in hash element at a.pl line 19, <FILE> line 3.
the column is $VAR1 = [
          'amLeader: 0
'
        ];
Use of uninitialized value $key in hash element at a.pl line 19, <FILE> line 4.
the column is $VAR1 = [
          'numAssignments   20909996
'
        ];
Use of uninitialized value $key in hash element at a.pl line 19, <FILE> line 5.
the column is $VAR1 = [
          'mpg 1 mrule 140 reg 7989 score 10625 rank 0 perc 100 mp_demand 40
'
        ];
Use of uninitialized value $key in hash element at a.pl line 19, <FILE> line 6.
the column is $VAR1 = [
          'mpg 2 mrule 140 reg 7989 score 10625 rank 0 perc 100 mp_demand 50
'
        ];
Use of uninitialized value $key in hash element at a.pl line 19, <FILE> line 7.
the column is $VAR1 = [
          'mpg 1 mrule 140 reg 7989 score 10625 rank 0 perc 100 mp_demand 100
'
        ];
Use of uninitialized value $key in hash element at a.pl line 19, <FILE> line 8.
the column is $VAR1 = [
          'mpg 2 mrule 140 reg 7989 score 10625 rank 0 perc 100 mp_demand 40
'
        ];
Use of uninitialized value $key in hash element at a.pl line 19, <FILE> line 9.
the column is $VAR1 = [
          'mpg 3 mrule 150 reg 7989 score 0 rank 0 perc 100 mp_demand 20
'
        ];
Use of uninitialized value $key in hash element at a.pl line 19, <FILE> line 10.
the column is $VAR1 = [
          'mpg 4 mrule 150 reg 7989 score 10625 rank 0 perc 100 mp_demand 40
'
        ];
Use of uninitialized value $key in hash element at a.pl line 19, <FILE> line 11.
the column is $VAR1 = [
          'mpg 3 mrule 150 reg 7989 score 0 rank 0 perc 100 mp_demand 20
'
        ];
Use of uninitialized value $key in hash element at a.pl line 19, <FILE> line 12.
the column is $VAR1 = [
          'mpg 4 mrule 150 reg 7989 score 10625 rank 0 perc 100 mp_demand 40
'
        ];
Use of uninitialized value $key in hash element at a.pl line 19, <FILE> line 13.
the column is $VAR1 = [
          '
'
        ];
Use of uninitialized value $key in hash element at a.pl line 19, <FILE> line 14.
the hash is $VAR1 = {
          '' => undef
        };
the demand array is $VAR1 = [
          {
            '' => undef
          }
        ];

1 个答案:

答案 0 :(得分:1)

use strict;
use warnings;

use Data::Dumper;

my %mp_demand_by_mrule;

while (<DATA>) {
    next unless /mrule/;
    my %record = split(/[=\s]+/);
    push(@{$mp_demand_by_mrule{$record{mrule}}}, $record{mp_demand});
}

print Dumper(\%mp_demand_by_mrule);

__DATA__
ra_uuid: 592bbb0c-2c6b-11e8-8580-00e081ea0e98
cms_uuid: a4e6bffc-2c6a-11e8-a7cf-00e081ea0e8e
mpd_uuid: bf3fd34c-2c57-11e8-8bc5-00e081ea0e5c
amLeader: 0
numAssignments = 20909996
mpg=1 mrule=140 reg=7989 score=10625 rank=0 perc=100 mp_demand=40
mpg=2 mrule=140 reg=7989 score=10625 rank=0 perc=100 mp_demand=50
mpg=1 mrule=140 reg=7989 score=10625 rank=0 perc=100 mp_demand=100
mpg=2 mrule=140 reg=7989 score=10625 rank=0 perc=100 mp_demand=40
mpg=3 mrule=150 reg=7989 score=0 rank=0 perc=100 mp_demand=20
mpg=4 mrule=150 reg=7989 score=10625 rank=0 perc=100 mp_demand=40
mpg=3 mrule=150 reg=7989 score=0 rank=0 perc=100 mp_demand=20
mpg=4 mrule=150 reg=7989 score=10625 rank=0 perc=100 mp_demand=40

输出:

$VAR1 = {
          '140' => [
                     '40',
                     '50',
                     '100',
                     '40'
                   ],
          '150' => [
                     '20',
                     '40',
                     '20',
                     '40'
                   ]
        };
相关问题