如何使用grep搜索模式并排除其他模式

时间:2014-09-11 07:32:45

标签: regex bash sed grep

我一直在寻找其他几个答案,找不到我想要的东西。

我有一个大文件,其中包含一些网址,我正在寻找其中包含模式tt的网址。 当然每行都有http。所以如果我这样做

grep tt myfile | wc -l

我得到了文件的所有行。 如何在不匹配http的情况下找到与tt匹配的模式?

我尝试使用--exclude并且它不起作用,我认为排除仅适用于路径,对吧?

我可以使用sed并用其他东西替换http然后正常grep,但这有多优雅?必须有另一种方式...

5 个答案:

答案 0 :(得分:2)

您可以使用-P开关让grep将模式解释为Perl正则表达式。然后,您可以使用环绕声断言来匹配tt之前的 而不是h ,然后是p://

grep -iP '(?<!h)tt(?!ps?://)' myfile | wc -l

答案 1 :(得分:1)

拥有下一个测试文件

some text http://example.com/redirect?http://some/test.html             #not wanted
some text http://example.com/notete.html                                #not wanted
some text http://example.com/redirect?http://some/anyttany.html         #wanted
some text http://example.com/http.html                                  #wanted
some text http://example.com/tt.html                                    #wanted
some text http://example.com/somett.html                                #wanted
some text http://example.com/somettsome.html                            #wanted
some text /example.com/somettsome.html                                  #wanted (path only)

下一个:

grep -P 'http://\S*tt(?!p:)' file

打印

some text http://example.com/redirect?http://some/anyttany.html         #wanted
some text http://example.com/http.html                                  #wanted
some text http://example.com/tt.html                                    #wanted
some text http://example.com/somett.html                                #wanted
some text http://example.com/somettsome.html                            #wanted

平均

  http://                  'http://'
----------------------------------------------------------------------
  \S*                      non-whitespace (all but \n, \r, \t, \f,
                           and " ") (0 or more times (matching the
                           most amount possible))
----------------------------------------------------------------------
  tt                       'tt'
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    p:                       'p:'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------

grep -cP 'http://\S*tt(?!p:)' file

将计算匹配的行数

如果开头的http://是可选的,

 grep -P '(<=http://)?\S*tt(?!p:)' file

将执行相同的工作并进行相同的输入打印

some text http://example.com/redirect?http://some/anyttany.html         #wanted
some text http://example.com/http.html                                  #wanted
some text http://example.com/tt.html                                    #wanted
some text http://example.com/somett.html                                #wanted
some text http://example.com/somettsome.html                            #wanted
some text /example.com/somettsome.html                                  #wanted (path only)

用于捕获URL(和路径)

grep -oP '.*?\K(http:/)?/\S*tt(?!p:)\S*' file

打印

http://example.com/redirect?http://some/anyttany.html
http://example.com/http.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html
/example.com/somettsome.html

仅捕获http://

grep -oP '.*?\Khttp://\S*tt(?!p:)\S*' file

http://example.com/redirect?http://some/anyttany.html
http://example.com/http.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html

答案 2 :(得分:0)

您可以像这样使用awk

cat file:
http://example.com
http://google.com
my.tt.com
t.foo.bar
http://foobar.com
http://example.com/somett.html
http://example.com/http.html
http://example.com/notete.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html

awk -F"http:" '$NF~/tt/'
my.tt.com
http://example.com/somett.html
http://example.com/http.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html

答案 3 :(得分:0)

egrep -c 'http://[^ ?]*tt' YourFile
  • -c四计数
  • egrep for regex(您也可以使用grep -E)模式,允许排除搜索条件的http部分
  • 添加和排除空格/特殊网址字符(来自Jotne评论的建议及以下内容),以避免从同一行的最终第二个网址中获取tt。

答案 4 :(得分:0)

你可以使用grep -v来排除具有这种模式的行

grep tt myfile | grep -v http | wc -l

首先,给出带有“tt”的行,然后将那些带有“http”的行排除,然后计算它。