代码

Question

我是Perl的新手，我遇到了精神上的障碍。我需要从制表符分隔文件中提取信息，如下所示。

#name  years risk total
 adam  5     100  200
 adam  5     50   100
 adam  10    20   300
 bill  20    5    100
 bill  30    10   800

在此示例中，制表符分隔文件显示投资的长度，风险金额以及投资结束时的总金额。

我想解析这个文件，并为每个名称（例如亚当），计算投入的年数总和 5 + 5 ，并计算收入总和（200-100）+（100-50）+（300-20）。我还想保存每个名字的总数（200,100,300）。

这是我到目前为止所尝试的内容：

my $filename;
my $seq_fh;

open $seq_fh, $frhitoutput 
    or die "failed to read input file: $!";

while (my $line = <$seq_fh>) {

    chomp $line;
    ## skip comments and blank lines and optional repeat of title line

    next if $line =~ /^\#/ || $line =~ /^\s*$/ || $line =~ /^\+/;

    #split each line into array
    my @line = split(/\s+/, $line);
    my $yeartotal = 0;
    my $earning   = 0;

    #$line[0] = name
    #$line[1] = years
    #$line[2] = start
    #$line[3] = end

    while (@line[0]){

        $yeartotal += $line[1];
        $earning   += ($line[3]-$line[2]);
    }
}

关于我哪里出错的任何想法？

Answer 1

Text::CSV模块可用于读取制表符分隔的数据。通常比使用split等手动破解自己的东西要好得多，比如引用，转义等等。

Answer 2

你错了：while(@line[0]){

我会这样做：

my $seq_fh;
my %result;
open($seq_fh, $frhitoutput) || die "failed to read input file: $!";
while (my $line = <$seq_fh>) {
    chomp $line;
    ## skip comments and blank lines and optional repeat of title line
    next if $line =~ /^\#/ || $line =~ /^\s*$/ || $line =~ /^\+/;
    #split each line into array
    my @line = split(/\s+/, $line);
    $result{$line[0]}{yeartotal} += $line[1];
    $result{$line[0]}{earning} += $line[3] - $line[2];
}

Answer 3

你应该使用hash，如下所示：

my %hash;
while (my $line = <>) {

    next if $line =~ /^#/;

    my ($name, $years, $risk, $total) = split /\s+/, $line;

    next unless defined $name and defined $years
            and defined $risk and defined $total;

    $hash{$name}{years}    += $years;
    $hash{$name}{risk}     += $risk;
    $hash{$name}{total}    += $total;
    $hash{$name}{earnings} += $total - $risk;
}

foreach my $name (sort keys %hash) {

    print "$name earned $hash{$name}{earnings} in $hash{$name}{years}\n";
}

Answer 4

很好的机会探索Perl强大的命令行选项！：）

代码

注意：此代码应该是命令行oneliner，但这样更容易阅读。在正确的脚本文件中编写时，您应该启用严格和警告并使用更好的名称。此版本不会严格编译，您必须声明our $d。

#!/usr/bin/perl -nal

# collect data
$d{$F[0]}{y} += $F[1];
$d{$F[0]}{e} += $F[3] - $F[2];

# print summary
END { print "$_:\tyears: $d{$_}{y},\tearnings: $d{$_}{e}" for sort keys %d }

输出

adam:   years: 20,  earnings: 430
bill:   years: 50,  earnings: 885

解释

我在这里使用-n开关，基本上让你的代码迭代输入记录（-l告诉它使用行）。 -a开关允许perl将行拆分为数组@F。简化版：

while (defined($_ = <STDIN>)) {
    chomp $_;
    our(@F) = split(' ', $_, 0);

    # collect data
    $d{$F[0]}{y} += $F[1];
    $d{$F[0]}{e} += $F[3] - $F[2];
}

%d是一个哈希，其名称为keys，hashrefs为值，包含年（y）和收入（e）。

完成输入行处理后执行END块并输出%d。

使用 O 的Deparse查看实际执行的代码：

book:/tmp memowe$ perl -MO=Deparse tsv.pl
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
    chomp $_;
    our(@F) = split(' ', $_, 0);
    $d{$F[0]}{'y'} += $F[1];
    $d{$F[0]}{'e'} += $F[3] - $F[2];
    sub END {
        print "${_}:\tyears: $d{$_}{'y'},\tearnings: $d{$_}{'e'}" foreach (sort keys %d);
    }
    ;
}
tsv.pl syntax OK

Answer 5

它似乎是一个固定宽度的文件，我会使用unpack作为

如何在perl中解析制表符分隔文件？

5 个答案:

代码

输出

解释