grep:一种模式有效,但另一种无效

时间:2011-09-22 15:19:56

标签: grep design-patterns

我有一个teb分隔的文件,在一列中有基因名称,在另一列中有这些基因的表达值。我想用grep从这个文件中删除某些基因。所以,这个:

"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN"   "2.05365"
"42266" "snoMBII-202"   "0"
"42267" "snoMBII-202"   "0"
"42268" "snoMe28S-Am2634"   "0"
"42269" "snoMe28S-Am2634"   "0"
"42270" "snoR26"    "0"
"42271" "SNORA1"    "0"
"42272" "SNORA1"    "0"

成为这个:

"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN"   "2.05365"

我使用了以下命令,并将其与我有限的终端知识放在一起:

grep -iv sno* <input.text> | grep -iv rp* | grep -iv U6* | grep -iv 7SK* > <output.txt>

所以使用这个命令,我的输出文件缺少以sno,u6和7sk开头的基因,但不知何故,grep删除了所有含有“r”的基因,而不是以“rp”开头的基因。我对此非常困惑。任何想法为什么sno *工作,但rp *不?

谢谢!

3 个答案:

答案 0 :(得分:0)

虽然这不能直接回答你的问题,但你的示例命令行中有一件事你可能要小心:每当你使用特殊的shell元字符(比如“*”)时,你需要逃避或引用它。所以你的命令行看起来应该更像:

grep -iv 'sno*' <input.text> | grep -iv 'rp*' | grep -iv 'U6*' | grep -iv '7SK*' > <output.txt>

通常,shell是聪明的,如果没有文件匹配glob,它们将按原样使用文本(所以如果你输入“foo *”但是没有以“foo”开头的文件名,那么字符串“foo” *“将传递给命令。”

答案 1 :(得分:0)

 grep -iEv "sno|rp|U6|7SK" yourInput

<强>试验:

kent$  cat b
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN"   "2.05365"
"42266" "snoMBII-202"   "0"
"42267" "snoMBII-202"   "0"
"42268" "snoMe28S-Am2634"   "0"
"42269" "snoMe28S-Am2634"   "0"
"42270" "snoR26"    "0"
"42271" "SNORA1"    "0"
"42272" "SNORA1"    "0"

kent$  grep -iEv "sno|rp|U6|7SK" b
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN"   "2.05365"

答案 2 :(得分:0)

grep命令使用正则表达式,而不是通用模式。

模式rp*表示“'r'后跟零或更多'p'”。您真正想要的是rp.*,甚至更好,"rp.*(或者甚至只是"rp,在“rp”之后尝试grep是没有意义的“ 毕竟)。同样,sno*表示“'sn'后跟零或更多'o'”。同样,您需要sno.*"sno.*(甚至只需"sno)。