使用perl脚本合并多个文件中的指定行

时间:2015-12-28 20:36:17

标签: perl

file_1.txt

$thread1 = new threads \&callfunc1,"1";
$thread2 = new threads \&callfunc1,"2";
$thread3 = new threads \&callfunc1,"3";
$thread4 = new threads \&callfunc1,"4";
$thread5 = new threads \&callfunc1,"5";
$thread6 = new threads \&callfunc1,"6";
$thread7 = new threads \&callfunc1,"7";
$thread8 = new threads \&callfunc1,"8";
$thread9 = new threads \&callfunc1,"9";
$thread10 = new threads \&callfunc1,"10";
$thread11 = new threads \&callfunc1,"11";
$thread12 = new threads \&callfunc1,"12";

file_2.txt

$thread13 = new threads \&callfunc2,"1";
$thread14 = new threads \&callfunc2,"2";
$thread15 = new threads \&callfunc2,"3";
$thread16 = new threads \&callfunc2,"4";
$thread17 = new threads \&callfunc2,"5";
$thread18 = new threads \&callfunc2,"6";

file_3.txt

$thread19 = new threads \&callfunc3,"1";
$thread20 = new threads \&callfunc3,"2";
$thread21 = new threads \&callfunc3,"3";

file_4.txt

$thread22 = new threads \&callfunc4,"1";
$thread23 = new threads \&callfunc4,"2";
$thread24 = new threads \&callfunc4,"3";

我有四个文件。我需要合并这些文件并生成一个文件。新文件应包含file_1.txt中的每个奇数行,来自file_2.txt的偶数行 file_3.txt 中的第4行&来自 file_4.txt 的第8行。

merge.txt

$thread1 = new threads \&callfunc1,"1";
$thread13 = new threads \&callfunc2,"1";
$thread2 = new threads \&callfunc1,"2";
$thread19 = new threads \&callfunc3,"1";
$thread3 = new threads \&callfunc1,"3";
$thread14 = new threads \&callfunc2,"2";
$thread4 = new threads \&callfunc1,"4";
$thread22 = new threads \&callfunc4,"1";
$thread5 = new threads \&callfunc1,"5";
$thread15 = new threads \&callfunc2,"3";
$thread6 = new threads \&callfunc1,"6";
$thread20 = new threads \&callfunc3,"2";
$thread7 = new threads \&callfunc1,"7";
$thread16 = new threads \&callfunc2,"4";
$thread8 = new threads \&callfunc1,"8";
$thread23 = new threads \&callfunc4,"2";
$thread9 = new threads \&callfunc1,"9";
$thread17 = new threads \&callfunc2,"5";
$thread10 = new threads \&callfunc1,"10";
$thread21 = new threads \&callfunc3,"3";
$thread11 = new threads \&callfunc1,"11";
$thread18 = new threads \&callfunc2,"6";
$thread12 = new threads \&callfunc1,"12";
$thread24 = new threads \&callfunc4,"3";

我已经尝试过以下代码来实现这一点,但它正在合并每个文件中的一行。任何身体都可以帮助我。提前谢谢。

#merger
unlink "threadperl.txt";
my @files = ('file_1.txt','file_2.txt','file_3.txt','file_4.txt');
my @fh;

#create an array of open filehandles.
@fh = map { open my $f, $_ or die "Cant open $_:$!"; $f } @files;


open my $out_file, ">threadperl.txt" or die "can't open out_file: $!";

my $output;
do
{
    $output = '';
    foreach (@fh){

        my $line = <$_>;
        if (defined $line){
            #Special case: might not be a newline at the end of the file
            #add a newline if none is found.
            $line .= "\n" if ($line !~ /\n$/);
            $output .= $line;
        }
    }

    print {$out_file} $output;
}
while ($output ne '');

2 个答案:

答案 0 :(得分:1)

您没有指定合并文件的方式,我假设是连续组装的。

首先,将文件读入数组

    open my $handle, '<', "file_1.txt";
    chomp(my @file1 = <$handle>);
    close $handle;

然后,通过在数组的每个元素的索引上使用“map”表达式重新映射数组(map就像每个元素的内联一样):

    my @odd_indexed_elements = @file1[map { $_ * 2 + 1 } 1 .. int($#array / 2) - 1];
    my @even_indexed_elements = @file2[map { $_ * 2 } 1 .. int($#array / 2)];

然后你可以将两个阵列推出:

    print output push( @file1, @file2 );

答案 1 :(得分:0)

为了好玩,我想看看如果我们将过滤逻辑从读取循环中拉出来可能会是什么样子。只是另一种方法......这也不会将每个文件都淹没到内存中,因此它可以在可能更长的数据文件上运行,并且很容易扩展输入文件和放大器。过滤逻辑。

过滤逻辑很简洁,在文件定义之后的注释中查看一个较长形式的例子。

#!/usr/bin/perl

use strict;

my $debug = 0;

my @inFiles = (
   { fileName=>"file_1.txt", label=>"even", filter=>sub { ( shift->{lineCnt} % 2 ) == 0 } },
   { fileName=>"file_2.txt", label=>"odd",  filter=>sub { ( shift->{lineCnt} % 2 ) != 0 } },
   { fileName=>"file_3.txt", label=>"4th",  filter=>sub { ( shift->{lineCnt} % 4 ) == 0 } },
   { fileName=>"file_4.txt", label=>"8th",  filter=>sub { ( shift->{lineCnt} % 8 ) == 0 } }
   # Ok to add additional files here if desired, ok to use other filtering "logic".
   # For example, we could teach capture() to add the current line to a given $inFile,
   # then you could write "filters" subroutines that did pattern matching as well.
   # { fileName=>"file_4.txt",  # Path to input file
   #   label=>"8th",            # more or less a comment to describe the filter's goal.
   #   filter=>sub {            # read logic calls this to see if we should keep a line.
   #      # This is a more verbose version of hwo the filter logic works.
   #      # I want to point out you can get fairly complex, and include debug prints
   #      # in here.  Also just leaving it at "shift->{..." is a bit opaque.
   #      my $hash = shift;
   #      my $curLineNumber = $hash->{lineCnt};
   #      my $result = ( $curLineNumber % 8 ) == 0;
   #      print "$hash->{fileName}.$curLineNumber: label=$label, result=$result\n";
   #      return $result;
   #   }
   #  }
);

# Initialize our files.
# Since we are keeping everything we know about an input file
# in a HASH, we'll add some new keys here to make life easier.
foreach my $inFile  ( @inFiles ) {
   # $inFile is a hash ref for each of the file1 file2 etc.
   my $name = $inFile->{fileName}; # just a shortcut, we'll use name a lot so easier to read.
   -e $name || die "input file $name does not exist.";
   -f $name || die "input file $name is not a regular file.";
   # our first new key will be the file handle - we'll use this later for reading.
   open $inFile->{handle}, "<", "$name" || die "open $name for reading: $!";
   $inFile->{lineCnt} = 0; # another new key, count how many lines we have read from this file.
   $inFile->{filterCnt} = 0; # also count how many times our filter answers true.
   print "opened input file $inFile->{fileName}, label=$inFile->{label}\n" if $debug;
}

my $readCnt; # track how much (if anything) we read.
do {
   $readCnt = 0; # assume we read nothing this time.
   foreach my $inFile  ( @inFiles ) {
      $readCnt += capture( $inFile ); # may have read something...
   }
} while( $readCnt >= 1 ); # so long as we read soemthing try again.

print "Data reading completed, closing input files...\n";
my $totalHits = 0;
foreach my $inFile  ( @inFiles ) {
   close($inFile->{handle}) || warn "Ignoring error closing input file $inFile->{fileName}: $!";
   $totalHits += $inFile->{filterCnt};
   printf "\tfile: %12s  <%6s> #lines: %4d #hits: %4d\n"
      , $inFile->{fileName},
      , $inFile->{lineCnt},
      , $inFile->{label},
      , $inFile->{filterCnt},
}
print "Done.  Total hits=$totalHits\n";


sub capture {
   my $inFile = shift;
   my $line;
   my $readCnt = 0;
   my $handle = $inFile->{handle};
   if( $line = <$handle> ) {
      ++$inFile->{lineCnt};
      ++$readCnt;  # lets our caller know not out of data.
      my $filter = $inFile->{filter}; # get our filtering subroutine
      my $filterResult = &$filter( $inFile ); # invoke the subroutine
      printf "%s.%03d: <%5s> filterResult=%s\n", $inFile->{fileName},$inFile->{lineCnt}, $inFile->{label}, $filterResult if $debug;
      if( $filterResult  ) {
         ++$inFile->{filterCnt}; # count how many times the filter hits.
         print "$inFile->{fileName}.$inFile->{lineCnt}: $line";
         # you could write this to wherever you want it.
      }
   } else {
      # no more data for this input file, nothing to do.
   }
   return $readCnt; # will be 0 or 1
}