在两个正则表达式之间提取字符串“|”模式

时间:2018-06-13 09:24:17

标签: string shell extract cat

我想提取gi||之间的所有字符串。字符串的位置在所有行中都是一致的。

我正在尝试这个:

cat ERR594382_second_cat.test | sed -n '/gi\|/,/\|/p'

但是,它没有用。

这是我文件的负责人:

head ERR594382_second_cat.test 
ERR594382.28316455_3_6_1    gi|914605561|ref|WP_050599988.1|    22  54  67  99  4.03e-15    77.0    100.000 33  0   0   225971;1306953  Bacteria    Erythrobacter citreus;Erythrobacter citreus LAMA 915    ribonuclease D [Erythrobacter citreus]
ERR594382.28316455_65_2_3   gi|914605561|ref|WP_050599988.1|    13  46  11  44  2.15e-17    82.8    100.000 34  0   0   225971;1306953  Bacteria    Erythrobacter citreus;Erythrobacter citreus LAMA 915    ribonuclease D [Erythrobacter citreus]
ERR594382.28316459_1_1_2    gi|1270336953|gb|PHR32068.1|    8   53  863 903 6.98e-08    56.6    63.043  46  12  1   2024840 Bacteria    Methylophaga sp.    phosphohydrolase [Methylophaga sp.]
ERR594382.28316464_2_2_3    gi|705244733|gb|AIW56710.1| 2   33  145 176 5.76e-12    67.8    93.750  32  2   0   340016  Viruses uncultured virus    ribonucleotide reductase, partial [uncultured virus]
ERR594382.28316464_53_5_5   gi|1200458341|gb|OUV73944.1|    1   31  557 587 9.54e-11    64.3    80.645  31  6   0   1986721 Bacteria    Flavobacteriales bacterium TMED123  hypothetical protein CBC83_04720 [Flavobacteriales bacterium TMED123]
ERR594382.28316465_3_3_2    gi|787065740|dbj|BAR36435.1|    1   46  204 249 5.55e-10    63.2    58.696  46  19  0   1407671 Viruses uncultured Mediterranean phage uvMED    hypothetical protein [uncultured Mediterranean phage uvMED]
ERR594382.28316465_67_4_3   gi|787065740|dbj|BAR36435.1|    2   34  224 256 1.31e-07    55.1    66.667  33  11  0   1407671 Viruses uncultured Mediterranean phage uvMED    hypothetical protein [uncultured Mediterranean phage uvMED]
ERR594382.28316466_18_6_3   gi|1200295886|gb|OUU17830.1|    1   33  92  124 1.73e-12    70.1    100.000 33  0   0   1986638 Bacteria    Alphaproteobacteria bacterium TMED37    hypothetical protein CBB97_21775 [Candidatus Endolissoclinum sp. TMED37]
ERR594382.28316470_37_1_1   gi|787067413|dbj|BAR37857.1|    16  43  60  87  1.94e-09    58.9    96.429  28  1   0   1407671 Viruses uncultured Mediterranean phage uvMED    terminase large subunit [uncultured Mediterranean phage uvMED]
ERR594382.28316474_2_5_1    gi|1219813777|gb|ASN63501.1|    1   33  62  94  3.55e-12    64.3    81.818  33  6   0   340016  Viruses uncultured

1 个答案:

答案 0 :(得分:0)

您可以使用dbms.windows_service_name或/ grep(如果使用macOS):

pcregrep

或与:

pcregrep -o "gi\|\K.+?(?=\|)" file

grep -oP "gi\|\K.+?(?=\|)" file 可以理解为排除前面左边的所有内容,只返回右边的\K,然后.+匹配任何字符,直到.+?(?=\|)为止找到。

如果只修改了分隔符,最简单的方法可能是|

cut