如何使用Perl解析文件的一部分?

时间:2010-10-18 15:58:12

标签: perl

我是Perl的新手,但我听说这对解析文件非常有用,所以我想过给它一个旋转。

我有一个文本文件,其中包含以下示例信息:

High school is used in some
parts of the world, particularly in
Scotland, North America and Oceania to
describe an institution that provides
all or part of secondary education.
The term "high school" originated in
Scotland with the world's oldest being
the Royal High School (Edinburgh) in
1505.

The Royal High School was used as a
model for the first public high school
in the United States, the English High
School founded in Boston,
Massachusetts, in 1821. The precise
stage of schooling provided by a high
school differs from country to
country, and may vary within the same
jurisdiction. In all of New Zealand
and Malaysia along with parts of
Australia and Canada, high school is
synonymous with secondary school, and
encompasses the entire secondary stage
of education.

======================================
Grade1 87.43%
Grade2 84.30%
Grade3 83.00%
=====================================

我想解析文件并只获取数字信息。一世 看看正则表达式,我想我会使用像

这样的东西
if (m/^%/) {
    do something
}
else {
    skip the line
}

但是,我真正想做的是跟踪变量 留下并将数值存储在该变量中。所以,之后 解析文件,我真的想拥有以下变量 将%值存储在其中。原因是,我想 创建不同等级的饼图/条形图。

  

等级1 = 87.43   等级2 = 84.30

...

你能否提出我应该看的方法?

5 个答案:

答案 0 :(得分:6)

你需要一个正则表达式。像下面的东西应该工作

while (<>) {
  /(Grade[0-9]+)\s*([0-9]+\.[0-9]+)/;
  $op{$1} = $2;
}

作为过滤器。 op哈希将存储成绩名称和分数。这比自动实例化变量更好。

答案 1 :(得分:3)

如果您可以保证您的兴趣点嵌套在两个=之间(并且在给定文件中没有奇数个这样的分界线),那么触发器操作符在这里是个方便的东西:

use strict;    # These two pragmas go a long, ...
use warnings;  # ... long way in helping you code better

my %scores;    # Create a hash of scores

while (<>) {   # The diamond operator processes all files ...
               # ... supplied at command-line, line-by-line

    next unless /^=+$/ .. /^=+$/;  # The flip-flop operator used ...
                                   # ... to filter out only 'grades'

    my ( $name, $grade ) = split;  # This usage of split will break ...
                                   # ... the current line into an array    

    $scores{$name} = $grade;       # Associate grade with name
}

答案 2 :(得分:2)

您想要使用哈希。这样的事情可以解决问题:

my %grades = (); # this is a hash
open(my $fh, "grade_file.txt" ) or die $!;
while( my $line = <$fh> ) {
     if( my( $name, $grade ) = $line =~ /^(Grade\d+)\s(\d+\.\d+\%) ) {
         $grades{$name} = $grade;
     }
}
close($fh);

您的%grades哈希将包含名称和成绩对。 (像my $value = $grades{'Grade1'}

一样访问它

也只是一张纸条。该语言称为“Perl”,而不是“PERL”。 Perl社区中的许多人对此感到不安: - )

答案 3 :(得分:0)

有关使用触发器操作符的示例,请参阅Zaid's answer(这是我建议的)。但是,如果您遇到困难(有时DWIMmery可能会妨碍),您还可以在逐行读取文件时显式维护状态:

#!/usr/bin/perl

use strict; use warnings;

my %grades;
my $interesting;

while ( my $line = <DATA> ) {
    if ( not $interesting and $line =~ /^=+\s*\z/ ) {
        $interesting = 1;
        next;
    }
    if ( $interesting ) {
        if ( $line =~ /^=+\s*$/ ) {
            $interesting = 0;
            next;
        }
        elsif ( my ($name, $grade) = $line =~ /^(\w+)\s+(\d+\.\d+%)/ ) {
            # Keep an array in case the same name occurs
            # multiple times
            push @{ $grades{$name} }, $grade;
        }
    }
}

use YAML;
print Dump \%grades;

答案 4 :(得分:-1)

创建动态变量名可能不会对生成图形有所帮助;使用数组几乎肯定是一个更好的主意。

但是,如果你真的认为你想这样做:

while (my $line = <$your_infile_handler>){
   if ($line =~ m/(.*) = ([0-9.]*)){
      $$1 = $2;
   }
}

应该做到这一点。