按ID提取序列

时间:2014-08-18 21:49:25

标签: perl sequences fasta

我想搜索像这样的多快速文件

>NCLIV_004380  | Neospora caninum | Cathepsin L, related | genomic | NCLIV_chrIb reverse | (geneStart+0 to geneEnd+0) | length=2793
ATGGACAACAGTGAGACGCACTACGTCTCCTTCCTCAACGGCGAGGGCGACGACGGATTG
GAGAACGGCGAGCTCCACCAGCGACGAGGCGTCCGAGCCGGCGGCGTGGCTGCAACTCCC
TACGTAGTAACGACTCGGACGTACTTTTGGAAGAAATTCCTGCGTCAGCGCAACTTTAAA
ACTCGGGCCTGGATCGCACTCGTAGCAGCGGCTGTGTCTCTCCTTGTCTTTGCCTCCTTC
CTCATTCAGTGGCAGGGAGATGACGATCGGGGTGTTTTCCCGCCGTCACCAGTCGAGGAC
CACAAAACCCCGGTGAACATCTGGGAGTGGAAAGAAGAACACTTCCAGAACGCCTTCGGC
>NCLIV_004381  | Neospora caninum | Cathepsin L, related | genomic | NCLIV_chrIb reverse | (geneStart+0 to geneEnd+0) | length=2793
ATGGACAACAGTGAGACGCACTACGTCTCCTTCCTCAACGGCGAGGGCGACGACGGATTG
GAGAACGGCGAGCTCCACCAGCGACGAGGCGTCCGAGCCGGCGGCGTGGCTGCAACTCCC
TACGTAGTAACGACTCGGACGTACTTTTGGAAGAAATTCCTGCGTCAGCGCAACTTTAAA
ACTCGGGCCTGGATCGCACTCGTAGCAGCGGCTGTGTCTCTCCTTGTCTTTGCCTCCTTC
CTCATTCAGTGGCAGGGAGATGACGATCGGGGTGTTTTCCCGCCGTCACCAGTCGAGGAC
CACAAAACCCCGGTGAACATCTGGGAGTGGAAAGAAGAACACTTCCAGAACGCCTTCGGC
>NCLIV_004382  | Neospora caninum | Cathepsin L, related | genomic | NCLIV_chrIb reverse | (geneStart+0 to geneEnd+0) | length=2793
ATGGACAACAGTGAGACGCACTACGTCTCCTTCCTCAACGGCGAGGGCGACGACGGATTG
GAGAACGGCGAGCTCCACCAGCGACGAGGCGTCCGAGCCGGCGGCGTGGCTGCAACTCCC
TACGTAGTAACGACTCGGACGTACTTTTGGAAGAAATTCCTGCGTCAGCGCAACTTTAAA
ACTCGGGCCTGGATCGCACTCGTAGCAGCGGCTGTGTCTCTCCTTGTCTTTGCCTCCTTC
CTCATTCAGTGGCAGGGAGATGACGATCGGGGTGTTTTCCCGCCGTCACCAGTCGAGGAC
CACAAAACCCCGGTGAACATCTGGGAGTGGAAAGAAGAACACTTCCAGAACGCCTTCGGC

和其他文件中的ID

NCLIV_004381
NCLIV_004382

我想根据ID从多个fasta中剪切序列并将它们保存到另一个文件中。所以最后会有两个文件:一个包含ID这样的序列

>NCLIV_004381  | Neospora caninum | Cathepsin L, related | genomic | NCLIV_chrIb reverse | (geneStart+0 to geneEnd+0) | length=2793
ATGGACAACAGTGAGACGCACTACGTCTCCTTCCTCAACGGCGAGGGCGACGACGGATTG
GAGAACGGCGAGCTCCACCAGCGACGAGGCGTCCGAGCCGGCGGCGTGGCTGCAACTCCC
TACGTAGTAACGACTCGGACGTACTTTTGGAAGAAATTCCTGCGTCAGCGCAACTTTAAA
ACTCGGGCCTGGATCGCACTCGTAGCAGCGGCTGTGTCTCTCCTTGTCTTTGCCTCCTTC
CTCATTCAGTGGCAGGGAGATGACGATCGGGGTGTTTTCCCGCCGTCACCAGTCGAGGAC
CACAAAACCCCGGTGAACATCTGGGAGTGGAAAGAAGAACACTTCCAGAACGCCTTCGGC
>NCLIV_004382  | Neospora caninum | Cathepsin L, related | genomic | NCLIV_chrIb reverse | (geneStart+0 to geneEnd+0) | length=2793
ATGGACAACAGTGAGACGCACTACGTCTCCTTCCTCAACGGCGAGGGCGACGACGGATTG
GAGAACGGCGAGCTCCACCAGCGACGAGGCGTCCGAGCCGGCGGCGTGGCTGCAACTCCC
TACGTAGTAACGACTCGGACGTACTTTTGGAAGAAATTCCTGCGTCAGCGCAACTTTAAA
ACTCGGGCCTGGATCGCACTCGTAGCAGCGGCTGTGTCTCTCCTTGTCTTTGCCTCCTTC
CTCATTCAGTGGCAGGGAGATGACGATCGGGGTGTTTTCCCGCCGTCACCAGTCGAGGAC
CACAAAACCCCGGTGAACATCTGGGAGTGGAAAGAAGAACACTTCCAGAACGCCTTCGGC

和另一个没有ID的序列,如此

>NCLIV_004380  | Neospora caninum | Cathepsin L, related | genomic | NCLIV_chrIb reverse | (geneStart+0 to geneEnd+0) | length=2793
ATGGACAACAGTGAGACGCACTACGTCTCCTTCCTCAACGGCGAGGGCGACGACGGATTG
GAGAACGGCGAGCTCCACCAGCGACGAGGCGTCCGAGCCGGCGGCGTGGCTGCAACTCCC
TACGTAGTAACGACTCGGACGTACTTTTGGAAGAAATTCCTGCGTCAGCGCAACTTTAAA
ACTCGGGCCTGGATCGCACTCGTAGCAGCGGCTGTGTCTCTCCTTGTCTTTGCCTCCTTC
CTCATTCAGTGGCAGGGAGATGACGATCGGGGTGTTTTCCCGCCGTCACCAGTCGAGGAC
CACAAAACCCCGGTGAACATCTGGGAGTGGAAAGAAGAACACTTCCAGAACGCCTTCGGC

任何帮助都将非常感谢。

0 个答案:

没有答案