Question

这是我用来模式匹配的脚本。我没有得到我需要的确切输出请帮帮我..

#!/usr/bin/perl5.14.4
open(LIST, "/home/guest/Desktop/hpresult.txt") 
    or die ("Couldn't open the  Result");
@list = <LIST>;
close LIST;
open(OUTPUT, ">/home/guest/Desktop/sortresult3") 
    or die ("couldn't write the file");
$line = (@list);
foreach $line(@list) {
    if($line =~ m/>/g) {
        $pdbid = substr($line, 0);
    }
    if($line =~ m/Found/g) {
        $id = $line;
        print OUTPUT $pdbid . $id;
    }
}

INPUT

hpresult.txt  
>3ior_B  
Found PPPPPPPPPPP at 397 to 407 of length 11  
Found QQQQQQQQQ at 388 to 396 of length 9  

>3ior_C  
Found QQQQQQQQQQQQQ at 388 to 400 of length 13  

>3ios_A  

>3iot_A

输出（我得到的）

>3ior_B  
Found PPPPPPPPPPP at 397 to 407 of length 11  
>3ior_B  
Found QQQQQQQQQ at 388 to 396 of length 9  
>3ior_C  
Found QQQQQQQQQQQQQ at 388 to 400 of length 13

所需的输出

>3ior_B  
Found PPPPPPPPPPP at 397 to 407 of length 11  
Found QQQQQQQQQ at 388 to 396 of length 9  

>3ior_C  
Found QQQQQQQQQQQQQ at 388 to 400 of length 13

请帮我解决这个问题..

Answer 1

关于代码的一些注意事项。修复这些问题后，您应该有一个完全不同的程序来处理，或者应该提出一个新问题：

始终，始终使用

use strict;
use warnings;

特别是Perl的新手。 strict将帮助您避免混淆范围和变量名称（强制您使用my显式声明变量）等等。 warnings会警告你，你正在做的事情可能是无心的。学习使用这两个pragma所需的时间，您将在以后缩短调试时间，并对您的程序有更多控制权。

open(LIST, "/home/guest/Desktop/hpresult.txt") 
    or die ("Couldn't open the  Result");
@list = <LIST>;
close LIST;
open(OUTPUT, ">/home/guest/Desktop/sortresult3") 
    or die ("couldn't write the file");

在这里打开两个文件句柄并将文件粘贴到数组中。在这样的小程序中，在我看来，最好不对输入和输出文件进行硬编码，而是使用菱形运算符，并依赖shell重定向将输出保存到文件中。将文件粘贴到数组中效率很低。

以下是它的基本要点，取代所有这些文件处理：

my $junk = <>;   # take first line away
while (<>) {     # reads the argument file names line-by-line
    # process lines here
}

如果你想打开文件，你应该使用三个参数open（使用显式MODE）和一个词法文件句柄：

open my $fh, "<", $file or die "Cannot open file for reading: $!";

这一行：

$line = (@list);

是完全冗余的，考虑以下行，您可以在其中启动for循环。它会将@list的最后一个元素分配给$line，在下一行中，它将使用本地化版本“覆盖”该值。但是，在循环之后，$line将返回此值，毫无疑问会让您感到困惑。请参阅他们询问有关本地化变量的this question。

我不确定你在这里要做什么。我假设您可能尝试获取文件中的第一行并将其删除。如果是这种情况，您只需执行

即可

shift @list;

但正如您将看到的那样，因为将文件读入数组并不是最佳解决方案，所以我们不会使用它。

if($line =~ m/>/g) {
    $pdbid = substr($line, 0);
}
if($line =~ m/Found/g) {
    $id = $line;
    print OUTPUT $pdbid . $id;
}

正如ikegami所说，使用/g修饰符和if语句是没有意义的。此外，substr($line, 0)将获取字符串$line的完整副本。不确定你在那里做什么。但在这种情况下编写$pdbid = $line更简单（并且更少混淆）。

如果您需要所需的输出，则需要区分不同的标题，可能使用变量来记住您打印的标题

if ($line =~ /Found/) {
    print $pdbid if $printed_pdbid ne $pdbid;
    print $line;
    $printed_pdbid = $pdbid;
}

所以，基本上你需要的是

use strict;
use warnings;

my $junk = <>;
my $old = "";                              # to avoid undef warning
my ($current, $pdbid);
while (<>) {
    if (/^>/) {                            # if line begins with >
        $pdbid = $_;                       # store header
    } elsif (/Found/) {                    # automatically skip to next line
        print $pdbid if $old ne $pdbid;
        $old = $pdbid;                     # store old header
        print $_;                          # print current line
    }
}

将提供以下输出：

>3ior_B
Found PPPPPPPPPPP at 397 to 407 of length 11
Found QQQQQQQQQ at 388 to 396 of length 9
>3ior_C
Found QQQQQQQQQQQQQ at 388 to 400 of length 13

您还可以使用段落模式，其中包括更改输入记录分隔符$/以使Perl考虑以两个换行符结尾的行\n\n：

my $junk = <>;          # before changing $/ reads single line
$/ = "\n\n";            # input record separator 
$\ = "\n\n";            # output record separator (for print())
while (<>) {            # read paragraph
    chomp;
    my ($hdr, @lines) = split /(?=\n)/;    # split paragraph
    print ($hdr, @lines) if @lines;        # if @lines is empty, skip
}

这有点不真实，因为真正的段落模式涉及将输入记录分隔符设置为空字符串$/ = ""，但在这种情况下，由于我们正在取出新行并将它们放回去，所以最好是保持一致。

另请注意，由于我们使用前瞻断言(?=...)拆分段落，因此我们实际上并未删除换行符，而是将其保存以用于打印。但是，我们正在使用chomp删除段落换行符。

此处列出的我的程序的用法是

perl script.pl input > output

如果您只想查看输出，请跳过带有重定向的最后一部分

perl script.pl input

Answer 2

您的文件有fasta外观，而且您还可以使用序列位置/长度。

与fasta文件一样，您的文件包含以“＆gt;”分隔的记录，因此我们可以通过将Perl的记录分隔符$/设置为“＆gt;”来读取这些“块”中的文件，然后查找“发现“在那些块中。如果找到“找到”，则打印块：

use strict;
use warnings;

local $/ = '>';

while (<>) {
    chomp;
    print ">$_" if /Found/;
}

用法：perl script.pl inFile >outFile

数据集输出：

>3ior_B  
Found PPPPPPPPPPP at 397 to 407 of length 11  
Found QQQQQQQQQ at 388 to 396 of length 9  

>3ior_C  
Found QQQQQQQQQQQQQ at 388 to 400 of length 13

希望这有帮助！

Answer 3

尝试使用：

# ALWAYS
use strict;
use warnings;

my $filein = "/home/guest/Desktop/hpresult.txt";
my $fileout = "/home/guest/Desktop/sortresult3";
# use 3-arg open
open my $LIST, '<', $filein or die "Unable to open '$filein': $!";
open my $OUT, '>', $fileout or die "Unable to open '$fileout': $!";

my $id;
while(my $line = <$LIST>) {
    chomp $line;
    if ($line =~ />/) {
        $id = $line;
    } elsif ($line =~ /Found/) {
        print $OUT $id,"\n" if $id;
        # id is printed only once
        $id = '';
        print $OUT $line,"\n";
    }
}

与Perl中的数组匹配的模式

3 个答案: