哈希表和注册表问题

时间:2014-11-29 18:34:20

标签: perl hash

我有几个看起来像这样的文件:

>gi|602625396|gb|AHN95903.1| NADH dehydrogenase subunit 1 (mitochondrion) [Ixobrychus      sinensis]
    MTWLSTTIYLTMFLSYAIPILLAVAFLTLVERKVLSYMQSRKGPNIVGPFGLLQPLADGVKLFIKEPIRP
    STSSPLLFIITPMLALLLAITIWTPLPLPFPLADLNLGLLFLLAMSSLAVYSILWSGWASNSKYALIGAL
    RAVAQTISYEVTLAIILLSVILLSGNYTLNTLATTQEPLYLIFSSWPLAMMWYISTLAETNRAPFDLTEG
    RASYPRFRYDQLMHLLWKNFLPLTLALCLWHTSMPICYAGIPPFL

>gi|602625397|gb|AHN95904.1| NADH dehydrogenase subunit 2 (mitochondrion) [Ixobrychus   sinensis]
MNPHAKLLSSTSLLLGTTITISSNHWVMAWTGLEINTLAIIPLISKSHHPRAIEASIKYFLVQATASALV
LFSSLMNAWFTGQWDITQLNHPTSCLLLTTAIAMKLGLVPFHFWFPEVLQGSSLITGLLLSTVMKLPPIS
ILFMTSHSLNPTLLTTMAIASAALGGWMGLNQTQLRKILAFSSISHLGWMTIIVAYDPKLTLLTFYLYTL
ITTAIFLTLYKTKTLKLPTMMTPWTKIPTLNATLMLTLLSLAGLPPLTGFLPKWLIIQELTKQELTLAAT
TIAMLSLLSLFFYLRLTYYSTITLPPNSTNHMKQWHINKPTDTMLAILTSLSISLLPLSPMIMTTV

…………..

以">"开头的行是后面的序列的标签。由于这些标签是奇怪的,我试图改变它们。 我写了以下脚本:

#!/usr/bin/perl
use strict;
use warnings;

my %hash_tags = (
    'cytochrome-B (mitochondrion)'     => 'cytochrome-B',
    'NADH dehydrogenase subunit 6'     => 'NADH_dehydrogenase_subunit_6',
    'NADH dehydrogenase subunit 1'     => 'NADH_dehydrogenase_subunit_1',
    'NADH dehydrogenase subunit 2'     => 'NADH_dehydrogenase_subunit_2',
    'cytochrome oxidase subunit 1'     => 'cytochrome_oxidase_subunit_1',
    'cytochrome oxidase subunit 2'     => 'cytochrome_oxidase_subunit_2',
    'ATP synthase subunit 8'           => 'ATP_synthase_subunit_8',
    'ATP synthase subunit 6'           => 'ATP_synthase_subunit_6',
    'cytochrome oxidase subunit 3'     => 'cytochrome_oxidase_subunit_3',
    'NADH dehydrogenase subunit 3'     => 'NADH_dehydrogenase_subunit_3',
    'NADH dehydrogenase subunit 4L'    => 'NADH_dehydrogenase_subunit_4L',
    'NADH dehydrogenase subunit 4'     => 'NADH_dehydrogenase_subunit_4',
    'NADH dehydrogenase subunit 5'     => 'NADH_dehydrogenase_subunit_5',
    'cytochrome c oxidase subunit I'   => 'cytochrome_c_oxidase_subunit_I',
    'cytochrome c oxidase subunit II'  => 'cytochrome_c_oxidase_subunit_II',
    'cytochrome c oxidase subunit III' => 'cytochrome_c_oxidase_subunit_III',
    'cytochrome b'                     => 'cytochrome_b'
);

my $fa     = $ARGV[0];
my @name   = split( '\.', $fa );
my $tag    = $name[0];
my $symbol = ">";
my @keys   = keys %hash_tags;
my $size   = @keys;
open my $fa_file, $fa or die "Could not open $fa: $!";
while ( my $line = <$fa_file> ) {
    chomp($line);
    my $whole_name = "";
    foreach my $key (@keys) {
        chomp($key);
        if ( $line =~ /$key/ ) {
            $whole_name = $symbol . $tag . "_" . $hash_tags{$key};
            print "$whole_name\n";
        }
    }
    unless ( $line =~ /^>gi/ ) {
        print "$line";
    }

}

我通过给出哈希值来更改标签。 在某些情况下,如细胞色素c氧化酶亚基II&#34; &#34;如果&#34;命令与reg ex鉴定和&#34;细胞色素c氧化酶亚基III&#34;或'&#34; NADH脱氢酶亚基4L&#34;找到匹配的&#34; NADH脱氢酶亚基4&#34;。 是否与reg ex有关,只能识别该行中的特定键?

非常感谢您提前,

瓦西利斯。

0 个答案:

没有答案