在阵列中携带多个捕获组但仅匹配一个组

时间:2014-11-09 17:56:49

标签: regex perl

我试图对一些现有代码做一个简单的修改而没有太多运气,我想从FILE1中携带另外一个捕获组:$ 2,然后像往常一样比较FILE2中的数据,如果发生成功匹配则打印两者。如果可能的话,请保持答案与我的尝试相似,这样我就能理解这些变化。

FILE1数据:

abc 99269 +t
abc 550 -a
abc 100 +a
gdh 126477 +t 
hduf 1700 +c

FILE2数据:

517 1878 forward
2156 3289 forward
99000 100000 forward
22000 23000 backward
999555 999999 backward 

期望的输出:

99269 +t 99000 100000 forward
550 -a 517 1878 forward
1700 +c 517 1878 forward 

代码:

#!/usr/bin/perl 

use strict;
use warnings;
use autodie;

my $outputfile = "/Users/edwardtickle/Documents/CC22CDSpositive.txt"; 

open FILE1, "/Users/edwardtickle/Documents/CC22indelscc.txt";

open FILE2, "/Users/edwardtickle/Documents/CDS_rmmge.CC22.CORE.aln";

open (OUTPUTFILE, ">$outputfile");
my @file1list=();
my @indels=();

while (<FILE1>) {
    if (/^\S+\s+(\d+)\s+(\S+)/) {
        push @file1list, $1;
        push @indels, $2;
    }
}

close FILE1;

while ( my $line = <FILE2> ) {
    if ($line =~ /^>\S+\s+\S+\s+(\d+)\s+(\d+)\s+(\S+)/) {
        my $cds1 = $1;
        my $cds2 = $2;
        my $cds3 = $3;

        for my $cc22 (@file1list) {
            for my $indel (@indels) {
                if ( $cc22 > $cds1 && $cc22 < $cds2 ) {
                    print OUTPUTFILE "$cc22 $indel $cds1 $cds2 $cds3\n";
                }
            }
        }
    }
}

close FILE2;
close OUTPUTFILE;

提前致谢!

1 个答案:

答案 0 :(得分:1)

令人沮丧的是,您似乎没有从您获得的许多解决方案和建议中学习。

这是一个可以按你要求做的程序。

use strict;
use warnings;
use 5.010;
use autodie;

chdir '/Users/edwardtickle/Documents';

open my $fh, '<', 'CDS_rmmge.CC22.CORE.aln';

my @file2;
while (<$fh>) {
  next unless /\S/;
  push @file2, [ split ];
}

open my $out, '>', 'CC22CDSpositive.txt';

open $fh, '<', 'CC22indelscc.txt';

while (<$fh>) {

  my @line1 = split;

  for my $line2 (@file2) {

    if ( $line1[1] >= $line2->[0] and $line1[1] <= $line2->[1] ) {
      my @out = ( @line1[1,2], @$line2 );
      print $out "@out\n";
      last;
    }
  }
}