Question

我对perl脚本很新，需要一些帮助。以下是我的疑问：

我有一个文件，其内容如下：

AA ABC 0 0 
line1
line2
...
AA XYZ 1 1
line..
line..
AA GHI 2 2
line..
line...

现在我希望获得具有起始字符串/模式"AA"的那些行之间的所有行，并将它们写入文件ABC.txt，XYZ.txt，GHI.txt，重复地包括行AA*，例如ABC.txt应该看起来像

AA ABC 0 0
line1
line2...

和XYZ.txt应该看起来像

AA XYZ 1 1
line..
line..

希望我在这个问题上很清楚，对此的任何帮助都非常感谢。

谢谢，沙

Answer 1

我认为您要求提供算法，因为您没有指定所需的帮助。

声明用于输出的文件句柄。
虽然您还没到达输入文件的末尾，
1. 阅读一行。
2. 如果它是标题行，
  1. 解析它。
  2. 确定文件名。
  3. （重新）打开输出文件。
3. 将该行打印到输出文件句柄。

为了避免使用自我发布以上内容之后发布的一个糟糕的解决方案，请参阅以下代码：

my $fh;
while (<>) {
   if (my ($fn) = /^AA\s+(\S+)/) {
      $fn .= '.txt';
      open($fh, '>', $fn)
         or die("Can't create file \"$fn\": $!\n");
   }

   print $fh $_;
}

可能的改进，所有这些都很容易添加：

检查重复的标头。（if -e $fn是单向的）
检查第一个标题前的数据。（if !$fh是单向的）

Answer 2

您只需要一次打开一个文件...当一行与XYZ匹配时，您打开XYZ.txt文件并输出该行。你保持该文件打开（让我们只说它是句柄CURRENT_FILE）并输出每个连续的行，直到你匹配一个新的标题行。然后关闭当前文件并打开另一个文件。

我的Perl非常生疏，所以我认为我不能提供编译的代码，但基本上它与此类似。

my $current_name = "";

foreach my $line (<INPUT>)
{
    my($name) = $line =~ /^AA (\w+)/;
    if( $name ne $current_name ) {
        close(CURRENT_FILE) if $current_name ne "";
        open(CURRENT_FILE, ">>", "$name.txt") || die "Argh\n";
        $current_name = $name;
    }
    next if $current_name eq "";
    print CURRENT_FILE $line;
}

close(CURRENT_FILE) if $current_name ne "";

Answer 3

您如何看待这个？

1：从文件中获取内容（可能使用File :: Slurp的read_file）并保存为标量。

use File::Slurp qw(read_file write_file);
my $contents = read_file($filename);

2：具有与此类似的正则表达式模式：

my @file_rows = ($contents ~= /(AA\s[A-Z]{3}\s+\d+\s+\w*)/);

3：如果第2列值在整个文件中始终是唯一的：

foreach my $file_row (@file_rows) {
    my @values = split(' ', $file_row, 3);
    write_file($values[1] . ".txt", $file_row);
}

3：否则：拆分行值。使用第二列作为键将它们存储到哈希中。使用散列将数据写入输出文件。

my %hash;
foreach my $file_row (@file_rows) {
    my @values = split(' ', $file_row, 3);
    if (defined $hash{$value[1]}) {
        $hash{$values[1]} .= $file_row;
    } else {
        $hash{$values[1]} = $file_row;
    }
}

foreach my $key (keys %hash) {
    write_file($key .'txt', $hash{$key});
}

Answer 4

这是一个选项，用于查找与每条记录的开头匹配的模式。找到后，它会遍历数据文件的行并构建一条记录，直到它再次找到相同的模式或eof，然后将该记录写入文件。在写入文件之前，它不会检查文件是否已经存在，因此如果它已经存在，它将替换ABC.txt：

use strict;
use warnings;

my $dataFile    = 'data.txt';
my $nextLine    = '';
my $recordRegex = qr/^AA\s+(\S+)\s+\d+\s+\d+/;

open my $inFH, '<', $dataFile or die $!;

RECORD: while ( my $line = <$inFH> ) {
    my $record = $nextLine . $line;

    if ( $record =~ $recordRegex ) {
        my $fileName = $1 . '.txt';

        while ( $nextLine = <$inFH> ) {
            if ( $nextLine =~ $recordRegex or eof $inFH ) {
                $record .= $nextLine if eof $inFH;

                open my $outFH, '>', $fileName or die $!;
                print $outFH $record;
                close $outFH;

                next RECORD;
            }

            $record .= $nextLine;
        }
    }
}

close $inFH;

希望这有帮助！

修改：此代码替换了有问题的原始代码。感谢您amon审核原始代码。

如何将某些匹配字符串之间的所有行打印到perl中的不同文件

4 个答案: