Question

我是perl的新手（一般的编程好），并且已经提供了perl脚本（Id_script3.pl）。

Id_script3.pl中的代码：

# main sub 
{ # closure 
# keep %species local to sub-routine but only init it once 
my %species; 
sub _init { 
    open my $in, '<', 'SpeciesId.txt' or die "could not open SpeciesId.txt: $!"; 
    my $spec; 
    while (<$in>) { 
        chomp; 
        next if /^\s*$/; # skip blank lines 
        if (m{^([A-Z])\s*=\s*(\d+(?:\.\d)?)(?:\s+AND\s+(\d+(?:\.\d)?))?$}) { 
            # handle letter = lines 
            $species{$spec}{$1} = [$2]; 
            push @{$species{$spec}{$1}}, $3 if $3; 
        } else { 
            # handle species name lines 
            $spec = $_; 
            $len = length($spec) if (length($spec) > $len); 
        } 
    } 
    close $in; 
} 
sub analyze { 
    my ($masses) = @_; 
    _init() unless %species; 
    my %data; 
    # loop over species entries 
SPEC: 
    foreach my $spec (keys %species) { 
        # loop over each letter of a species 
LTR: 
        foreach my $ltr (keys %{$species{$spec}}) { 
            # loop over each mass for a letter 
            foreach my $mass (@{$species{$spec}{$ltr}}) { 
                # skip to next letter if it is not found 
                next LTR unless exists($masses->{$mass}); 
            } 
            # if we get here, all mass values were found for the species/letter 
            $data{$spec}{cnt}++; 
        } 
    }

脚本需要修改，其中将使用'SpeciesId3.txt'而不是脚本当前使用的'SpeciesId.txt'。

这两个文件之间略有不同，因此需要对脚本进行一些修改才能使其正常运行;区别在于SpeciesId3.txt与原始的'SpeciesId.txt'相比，不包含字母（A =，B =，C =）和简单的（更多）更长的值列表。

SpeciesId.txt：

African Elephant

B = 1453.7
C = 1577.8
D = 2115.1
E = 2808.4
F = 2853.5 AND 2869.5
G = 2999.4 AND 3015.4

Indian Elephant

B = 1453.7
C = 1577.8
D = 2115.1
E = 2808.4
F = 2853.5 AND 2869.5
G = 2999.4 AND 3015.4

Rabbit

A = 1221.6 AND 1235.6
B = 1453.7
C = 1592.8
D = 2129.1
E = 2808.4
F = 2883.5 AND 2899.5
G = 2957.4 AND 2973.4

SpeciesID3.txt（要使用的文件/要修改的脚本：）

African Elephant


826.4
836.4
840.4
852.4
858.4
886.4
892.5
898.5
904.5
920.5
950.5
1001.5
1015.5
1029.5
1095.6
1105.6

Indian Elephant

835.4
836.4
840.5
852.4
868.4
877.4
886.4
892.5
894.5
898.5
908.5
920.5
950.5
1095.6
1105.6
1154.6
1161.6
1180.7
1183.6
1189.6
1196.6
1201.6
1211.6
1230.6
1261.6
1267.7


Rabbit

817.5
836.4
852.4
868.5
872.4
886.4
892.5
898.5
908.5
950.5
977.5
1029.5
1088.6
1095.6
1105.6
1125.5
1138.6
1161.6
1177.6
1182.6
1201.6
1221.6
1235.6
1267.7
1280.6
1311.6
1332.7
1378.5
1437.7
1453.7
1465.7
1469.7

如您所见，SpeciesID3.txt的字母（A =，B =）已丢失。

我尝试了几次尝试的“解决方法”，但我还没有写出一个有效的方法。

非常感谢，

斯蒂芬。

Answer 1

好吧，我不知道我是否会考虑保留该脚本，因为它看起来相当混乱，在子程序中使用script-globals和奇怪的标签。这是您可能要考虑的方法，使用Perl的段落模式，将input record separator $/设置为空字符串。

这有点笨重，因为chomp无法从哈希键中删除换行符，因此我使用do块进行补偿。 do { ... }的作用类似于子例程，并返回其最后执行的语句的值，在这种情况下返回数组的元素。

use strict;
use warnings;
use Data::Dumper;

local $/ = "";        # paragraph mode

my %a = do { my @x = <DATA>; chomp(@x); @x; };  # read the file, remove newlines
$_ = [ split ] for values %a;                   # split numbers into arrays
print Dumper \%a;                               # print data structure

__DATA__
African Elephant


826.4
836.4
840.4
852.4
858.4
886.4
892.5
898.5
904.5
920.5
950.5
1001.5
1015.5
1029.5
1095.6
1105.6

Indian Elephant

835.4
836.4
840.5
852.4
868.4
877.4
886.4
892.5
894.5
898.5
908.5
920.5
950.5
1095.6
1105.6
1154.6
1161.6
1180.7
1183.6
1189.6
1196.6
1201.6
1211.6
1230.6
1261.6
1267.7


Rabbit

817.5
836.4
852.4
868.5
872.4
886.4
892.5
898.5
908.5
950.5
977.5
1029.5
1088.6
1095.6
1105.6
1125.5
1138.6
1161.6
1177.6
1182.6
1201.6
1221.6
1235.6
1267.7
1280.6
1311.6
1332.7
1378.5
1437.7
1453.7
1465.7
1469.7

<强>输出：

$VAR1 = {
          'Rabbit' => [
                        '817.5',
                        '836.4',
                        '852.4',
                        '868.5',
                        '872.4',
                        '886.4',
                        '892.5',
                        '898.5',
                        '908.5',
                        '950.5',
                        '977.5',
                        '1029.5',
                        '1088.6',
                        '1095.6',
                        '1105.6',
                        '1125.5',
                        '1138.6',
                        '1161.6',
                        '1177.6',
                        '1182.6',
                        '1201.6',
                        '1221.6',
                        '1235.6',
                        '1267.7',
                        '1280.6',
                        '1311.6',
                        '1332.7',
                        '1378.5',
                        '1437.7',
                        '1453.7',
                        '1465.7',
                        '1469.7'
                      ],
          'Indian Elephant' => [
                                 '835.4',
                                 '836.4',
                                 '840.5',
                                 '852.4',
                                 '868.4',
                                 '877.4',
                                 '886.4',
                                 '892.5',
                                 '894.5',
                                 '898.5',
                                 '908.5',
                                 '920.5',
                                 '950.5',
                                 '1095.6',
                                 '1105.6',
                                 '1154.6',
                                 '1161.6',
                                 '1180.7',
                                 '1183.6',
                                 '1189.6',
                                 '1196.6',
                                 '1201.6',
                                 '1211.6',
                                 '1230.6',
                                 '1261.6',
                                 '1267.7'
                               ],
          'African Elephant' => [
                                  '826.4',
                                  '836.4',
                                  '840.4',
                                  '852.4',
                                  '858.4',
                                  '886.4',
                                  '892.5',
                                  '898.5',
                                  '904.5',
                                  '920.5',
                                  '950.5',
                                  '1001.5',
                                  '1015.5',
                                  '1029.5',
                                  '1095.6',
                                  '1105.6'
                                ]
        };

从这个相当详细的输出可以看出，结果是一个哈希，动物作为键，你的数字作为值。只要您可以依赖至少两个连续换行符分隔的名称和数字，并且数据中没有任意换行符，这种方法就可以解决问题。

Answer 2

if (m{^([A-Z])\s*=\s*(\d+(?:\.\d)?)(?:\s+AND\s+(\d+(?:\.\d)?))?$}) {

此行包含一个正则表达式，该表达式查找大写字母[A-Z]，后跟一个等号，并在\s*=\s*两边都有可选的空格。您基本上只想删除该前缀，只需匹配一个数字(\d+(?:\.\d)?)。

由于$1，$2，$3从最左边的左括号开始编号，所以您想要的号码现在位于$1。（带有?:的括号是非捕获的，不计算在内。）

您还需要更改变量%species，使其键是物种名称，其值只是一个数字列表（提取的观察值）。

所以：

if (m{^(\d+(?:\.\d)?)$}) { 
    push ${$species{$spec}}, $1; 
}

analyze子程序需要进行类似调整（LTR级别现在基本消失了。）

修改perl脚本

2 个答案: