如何从文本文件中读取表格数据 - Perl

时间:2016-02-03 06:36:43

标签: perl

我们有正常和表格形式的数据文本文件。我可以读取正常数据,但我无法读取表格形式的数据。

任何人都可以帮我解读并提取表格数据。

文字档案数据:

    225 Top Hitters
    RT(ms)    BRT(ms)    TL(ms)    l_mig_a    l_mig_w    b_mig_a    b_mig_w    l_b_mig_a    l_b_mig_w    b_l_mig_a    b_l_mig_w
    --------  ---------  --------  ---------  ---------  ---------  ---------  -----------  -----------  -----------  -----------
     11078.9      141.3    3754.8        418       7325          0          0            0            4            0            4


Total active inter-cluster migrations: 0
Total wakeup inter-cluster migrations: 8
Total active migrations: 418
Total wakeup migrations: 7333

我的代码:

    use strict;
    use warnings;
    my ($RT,$BRT,$TL ,$l_mig_a,$l_mig_w,$b_mig_a,$b_mig_w,$l_b_mig_a,$l_b_mig_w,$b_l_mig_a,$b_l_mig_w);
    open (FH, "<" ,"file.txt") or print "could not open $!";
    my @lines = <FH>;
    close FH;
    foreach my $line (@lines) {
        print "$line \n";
    }

预期输出:

$RT = 11078.9
$BRT = 141.3
$TL = 3754.8
$l_mig_a = 418
$l_mig_w = 7325
$b_mig_a = 0
$b_mig_w = 0
$l_b_mig_a = 0
$l_b_mig_w = 4
$b_l_mig_a = 0
$b_l_mig_w = 4

3 个答案:

答案 0 :(得分:0)

在您的预期输出中,您在每个标题名称前加上$。我希望您的意图不是eval结果并以编程方式使用这些值,因为有更好的方法可以做到这一点(例如,哈希)。如果这是您的计划,那么您在行尾也会丢失分号。

由于我无法从您的问题中推断出您的用例,因此我决定按原样转储密钥和值;随意添加你喜欢的任何装饰。

use strict;
use warnings;

my @keys;
my @values;

while (<DATA>) {
    if ($. == 2) {
        @keys = split;

        for (@keys) {
            s/\W.+$//;
        }

    } elsif ($. == 4) {
        @values = split;
        last;
    }
}

for my $i (0 .. $#keys) {
    print "$keys[$i] = $values[$i]\n";
}


__DATA__
225 Top Hitters
RT(ms)    BRT(ms)    TL(ms)    l_mig_a    l_mig_w    b_mig_a    b_mig_w    l_b_mig_a    l_b_mig_w    b_l_mig_a    b_l_mig_w
--------  ---------  --------  ---------  ---------  ---------  ---------  -----------  -----------  -----------  -----------
 11078.9      141.3    3754.8        418       7325          0          0            0            4            0            4


Total active inter-cluster migrations: 0
Total wakeup inter-cluster migrations: 8
Total active migrations: 418
Total wakeup migrations: 7333

如果您的输入文件实际上只有10行(即,您还没有告诉我们额外的500万个数据行),您可以简化阅读并将其拆分为几行代码:

my @lines  = <DATA>;
my @keys   = map { s/\W.+$//r } split(' ', $lines[1]);
my @values = split(' ', $lines[3]);

输出:

RT = 11078.9
BRT = 141.3
TL = 3754.8
l_mig_a = 418
l_mig_w = 7325
b_mig_a = 0
b_mig_w = 0
l_b_mig_a = 0
l_b_mig_w = 4
b_l_mig_a = 0
b_l_mig_w = 4

要收集值以供以后在程序中使用,同时保持标头和值之间的关联,请创建一个哈希:

my %hash;
@hash{@keys} = @values;

哈希将具有以下结构:

{
  b_l_mig_a => 0,
  b_l_mig_w => 4,
  b_mig_a => 0,
  b_mig_w => 0,
  BRT => 141.3,
  l_b_mig_a => 0,
  l_b_mig_w => 4,
  l_mig_a => 418,
  l_mig_w => 7325,
  RT => 11078.9,
  TL => 3754.8,
}

答案 1 :(得分:0)

这是Matt的替代策略,它搜索文件中包含一个或多个连字符-,可能的空格,而不是其他内容的第一行。然后列标签位于上一行,值列于下一行

use strict;
use warnings 'all';

use List::Util 'max';

use constant DATA_FILE => 'tabular_data.txt';

# Read the whole file into an array

my @file = do {
    open my $fh, '<', DATA_FILE or die $!;
    <$fh>;
};
chomp @file;

# Find the first line that contains only one or more hyphens
# and possibly some whitespace

my $i = 0;
for ( @file ) {
    last if /\-/ and not /[^-\s]/;
    ++$i;
}

die "Header line not found" unless $i < @file;

# Build the key array from the preceding line, and the
# values array from the succeeding line

my @keys = split ' ', $file[$i-1];
s/\(.*// for @keys;

my @values = split ' ', $file[$i+1];

my %data;
@data{@keys} = @values;

# Display what we've recovered

my $w = max map length, @keys;

for my $key ( @keys ) {
    printf "%-*s => %s\n", $w, $key, $data{$key};
}

输出

RT        => 11078.9
BRT       => 141.3
TL        => 3754.8
l_mig_a   => 418
l_mig_w   => 7325
b_mig_a   => 0
b_mig_w   => 0
l_b_mig_a => 0
l_b_mig_w => 4
b_l_mig_a => 0
b_l_mig_w => 4

答案 2 :(得分:-2)

你可以&#34;啜饮&#34;将整个文件转换为单个字符串变量,并使用正则表达式来解析表格数据。请在下面找到带有子程序的示例脚本,以简化生成正则表达式。

下面请查看将代码捆绑在一起的测试数据的示例实现到单个脚本/文件中。

use strict;
use warnings;

my $text;
{
  # put all lines into single string
  local $/ = undef;
  $text = <DATA>;
}

my $regex = &make_regex(qw{RT(ms)    BRT(ms)    TL(ms)    l_mig_a    l_mig_w    b_mig_a    b_mig_w    l_b_mig_a    l_b_mig_w    b_l_mig_a    b_l_mig_w});

print "REGEX-START\n$regex\nREGEX-END\n"; # Debuging: Show generated regular expression

my ($RT,$BRT,$TL ,$l_mig_a,$l_mig_w,$b_mig_a,$b_mig_w,$l_b_mig_a,$l_b_mig_w,$b_l_mig_a,$b_l_mig_w) 
   = $text =~ /$regex/ or die;

print "b_l_mig_w = $b_l_mig_w\n";

sub make_regex {
  my $n = scalar(@_);
  my $str = '
\s*' . join('\s+',map {quotemeta($_)} @_) . '\s*
\s*' . join('\s+',('-+')    x $n) . '\s*
\s*' . join('\s+',('(\S+)') x $n) . '\s*
';
  qr{$str}m;
} # end sub make_regex

__DATA__
    225 Top Hitters
    RT(ms)    BRT(ms)    TL(ms)    l_mig_a    l_mig_w    b_mig_a    b_mig_w    l_b_mig_a    l_b_mig_w    b_l_mig_a    b_l_mig_w
    --------  ---------  --------  ---------  ---------  ---------  ---------  -----------  -----------  -----------  -----------
     11078.9      141.3    3754.8        418       7325          0          0            0            4            0            4


Total active inter-cluster migrations: 0
Total wakeup inter-cluster migrations: 8
Total active migrations: 418
Total wakeup migrations: 733