我有几个看起来像这样的文件:
>gi|602625396|gb|AHN95903.1| NADH dehydrogenase subunit 1 (mitochondrion) [Ixobrychus sinensis]
MTWLSTTIYLTMFLSYAIPILLAVAFLTLVERKVLSYMQSRKGPNIVGPFGLLQPLADGVKLFIKEPIRP
STSSPLLFIITPMLALLLAITIWTPLPLPFPLADLNLGLLFLLAMSSLAVYSILWSGWASNSKYALIGAL
RAVAQTISYEVTLAIILLSVILLSGNYTLNTLATTQEPLYLIFSSWPLAMMWYISTLAETNRAPFDLTEG
RASYPRFRYDQLMHLLWKNFLPLTLALCLWHTSMPICYAGIPPFL
>gi|602625397|gb|AHN95904.1| NADH dehydrogenase subunit 2 (mitochondrion) [Ixobrychus sinensis]
MNPHAKLLSSTSLLLGTTITISSNHWVMAWTGLEINTLAIIPLISKSHHPRAIEASIKYFLVQATASALV
LFSSLMNAWFTGQWDITQLNHPTSCLLLTTAIAMKLGLVPFHFWFPEVLQGSSLITGLLLSTVMKLPPIS
ILFMTSHSLNPTLLTTMAIASAALGGWMGLNQTQLRKILAFSSISHLGWMTIIVAYDPKLTLLTFYLYTL
ITTAIFLTLYKTKTLKLPTMMTPWTKIPTLNATLMLTLLSLAGLPPLTGFLPKWLIIQELTKQELTLAAT
TIAMLSLLSLFFYLRLTYYSTITLPPNSTNHMKQWHINKPTDTMLAILTSLSISLLPLSPMIMTTV
…………..
以">"开头的行是后面的序列的标签。由于这些标签是奇怪的,我试图改变它们。 我写了以下脚本:
#!/usr/bin/perl
use strict;
use warnings;
my %hash_tags = (
'cytochrome-B (mitochondrion)' => 'cytochrome-B',
'NADH dehydrogenase subunit 6' => 'NADH_dehydrogenase_subunit_6',
'NADH dehydrogenase subunit 1' => 'NADH_dehydrogenase_subunit_1',
'NADH dehydrogenase subunit 2' => 'NADH_dehydrogenase_subunit_2',
'cytochrome oxidase subunit 1' => 'cytochrome_oxidase_subunit_1',
'cytochrome oxidase subunit 2' => 'cytochrome_oxidase_subunit_2',
'ATP synthase subunit 8' => 'ATP_synthase_subunit_8',
'ATP synthase subunit 6' => 'ATP_synthase_subunit_6',
'cytochrome oxidase subunit 3' => 'cytochrome_oxidase_subunit_3',
'NADH dehydrogenase subunit 3' => 'NADH_dehydrogenase_subunit_3',
'NADH dehydrogenase subunit 4L' => 'NADH_dehydrogenase_subunit_4L',
'NADH dehydrogenase subunit 4' => 'NADH_dehydrogenase_subunit_4',
'NADH dehydrogenase subunit 5' => 'NADH_dehydrogenase_subunit_5',
'cytochrome c oxidase subunit I' => 'cytochrome_c_oxidase_subunit_I',
'cytochrome c oxidase subunit II' => 'cytochrome_c_oxidase_subunit_II',
'cytochrome c oxidase subunit III' => 'cytochrome_c_oxidase_subunit_III',
'cytochrome b' => 'cytochrome_b'
);
my $fa = $ARGV[0];
my @name = split( '\.', $fa );
my $tag = $name[0];
my $symbol = ">";
my @keys = keys %hash_tags;
my $size = @keys;
open my $fa_file, $fa or die "Could not open $fa: $!";
while ( my $line = <$fa_file> ) {
chomp($line);
my $whole_name = "";
foreach my $key (@keys) {
chomp($key);
if ( $line =~ /$key/ ) {
$whole_name = $symbol . $tag . "_" . $hash_tags{$key};
print "$whole_name\n";
}
}
unless ( $line =~ /^>gi/ ) {
print "$line";
}
}
我通过给出哈希值来更改标签。 在某些情况下,如细胞色素c氧化酶亚基II&#34; &#34;如果&#34;命令与reg ex鉴定和&#34;细胞色素c氧化酶亚基III&#34;或'&#34; NADH脱氢酶亚基4L&#34;找到匹配的&#34; NADH脱氢酶亚基4&#34;。 是否与reg ex有关,只能识别该行中的特定键?
非常感谢您提前,
瓦西利斯。