Question

我一直在寻找其他几个答案，找不到我想要的东西。

我有一个大文件，其中包含一些网址，我正在寻找其中包含模式tt的网址。当然每行都有http。所以如果我这样做

grep tt myfile | wc -l

我得到了文件的所有行。如何在不匹配http的情况下找到与tt匹配的模式？

我尝试使用--exclude并且它不起作用，我认为排除仅适用于路径，对吧？

我可以使用sed并用其他东西替换http然后正常grep，但这有多优雅？必须有另一种方式...

Answer 1

您可以使用-P开关让grep将模式解释为Perl正则表达式。然后，您可以使用环绕声断言来匹配tt之前的不且 而不是的h ，然后是p://。

grep -iP '(?<!h)tt(?!ps?://)' myfile | wc -l

Answer 2

拥有下一个测试文件

some text http://example.com/redirect?http://some/test.html             #not wanted
some text http://example.com/notete.html                                #not wanted
some text http://example.com/redirect?http://some/anyttany.html         #wanted
some text http://example.com/http.html                                  #wanted
some text http://example.com/tt.html                                    #wanted
some text http://example.com/somett.html                                #wanted
some text http://example.com/somettsome.html                            #wanted
some text /example.com/somettsome.html                                  #wanted (path only)

下一个：

grep -P 'http://\S*tt(?!p:)' file

打印

some text http://example.com/redirect?http://some/anyttany.html         #wanted
some text http://example.com/http.html                                  #wanted
some text http://example.com/tt.html                                    #wanted
some text http://example.com/somett.html                                #wanted
some text http://example.com/somettsome.html                            #wanted

平均

  http://                  'http://'
----------------------------------------------------------------------
  \S*                      non-whitespace (all but \n, \r, \t, \f,
                           and " ") (0 or more times (matching the
                           most amount possible))
----------------------------------------------------------------------
  tt                       'tt'
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    p:                       'p:'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------

和

grep -cP 'http://\S*tt(?!p:)' file

将计算匹配的行数

如果开头的http://是可选的，

 grep -P '(<=http://)?\S*tt(?!p:)' file

将执行相同的工作并进行相同的输入打印

some text http://example.com/redirect?http://some/anyttany.html         #wanted
some text http://example.com/http.html                                  #wanted
some text http://example.com/tt.html                                    #wanted
some text http://example.com/somett.html                                #wanted
some text http://example.com/somettsome.html                            #wanted
some text /example.com/somettsome.html                                  #wanted (path only)

用于捕获URL（和路径）

grep -oP '.*?\K(http:/)?/\S*tt(?!p:)\S*' file

打印

http://example.com/redirect?http://some/anyttany.html
http://example.com/http.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html
/example.com/somettsome.html

仅捕获http://

grep -oP '.*?\Khttp://\S*tt(?!p:)\S*' file

http://example.com/redirect?http://some/anyttany.html
http://example.com/http.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html

Answer 3

您可以像这样使用awk

cat file:
http://example.com
http://google.com
my.tt.com
t.foo.bar
http://foobar.com
http://example.com/somett.html
http://example.com/http.html
http://example.com/notete.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html

awk -F"http:" '$NF~/tt/'
my.tt.com
http://example.com/somett.html
http://example.com/http.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html

Answer 4

egrep -c 'http://[^ ?]*tt' YourFile

-c四计数
egrep for regex（您也可以使用grep -E）模式，允许排除搜索条件的http部分
添加和排除空格/特殊网址字符（来自Jotne评论的建议及以下内容），以避免从同一行的最终第二个网址中获取tt。

Answer 5

你可以使用grep -v来排除具有这种模式的行

grep tt myfile | grep -v http | wc -l

首先，给出带有“tt”的行，然后将那些带有“http”的行排除，然后计算它。

如何使用grep搜索模式并排除其他模式

5 个答案: