我一直在寻找其他几个答案,找不到我想要的东西。
我有一个大文件,其中包含一些网址,我正在寻找其中包含模式tt的网址。 当然每行都有http。所以如果我这样做
grep tt myfile | wc -l
我得到了文件的所有行。 如何在不匹配http的情况下找到与tt匹配的模式?
我尝试使用--exclude并且它不起作用,我认为排除仅适用于路径,对吧?
我可以使用sed并用其他东西替换http然后正常grep,但这有多优雅?必须有另一种方式...
答案 0 :(得分:2)
您可以使用-P
开关让grep
将模式解释为Perl正则表达式。然后,您可以使用环绕声断言来匹配tt
之前的 不 且 而不是的h
,然后是p://
。
grep -iP '(?<!h)tt(?!ps?://)' myfile | wc -l
答案 1 :(得分:1)
拥有下一个测试文件
some text http://example.com/redirect?http://some/test.html #not wanted
some text http://example.com/notete.html #not wanted
some text http://example.com/redirect?http://some/anyttany.html #wanted
some text http://example.com/http.html #wanted
some text http://example.com/tt.html #wanted
some text http://example.com/somett.html #wanted
some text http://example.com/somettsome.html #wanted
some text /example.com/somettsome.html #wanted (path only)
下一个:
grep -P 'http://\S*tt(?!p:)' file
打印
some text http://example.com/redirect?http://some/anyttany.html #wanted
some text http://example.com/http.html #wanted
some text http://example.com/tt.html #wanted
some text http://example.com/somett.html #wanted
some text http://example.com/somettsome.html #wanted
平均
http:// 'http://'
----------------------------------------------------------------------
\S* non-whitespace (all but \n, \r, \t, \f,
and " ") (0 or more times (matching the
most amount possible))
----------------------------------------------------------------------
tt 'tt'
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
p: 'p:'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
和
grep -cP 'http://\S*tt(?!p:)' file
将计算匹配的行数
如果开头的http://
是可选的,
grep -P '(<=http://)?\S*tt(?!p:)' file
将执行相同的工作并进行相同的输入打印
some text http://example.com/redirect?http://some/anyttany.html #wanted
some text http://example.com/http.html #wanted
some text http://example.com/tt.html #wanted
some text http://example.com/somett.html #wanted
some text http://example.com/somettsome.html #wanted
some text /example.com/somettsome.html #wanted (path only)
用于捕获URL(和路径)
grep -oP '.*?\K(http:/)?/\S*tt(?!p:)\S*' file
打印
http://example.com/redirect?http://some/anyttany.html
http://example.com/http.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html
/example.com/somettsome.html
仅捕获http://
grep -oP '.*?\Khttp://\S*tt(?!p:)\S*' file
http://example.com/redirect?http://some/anyttany.html
http://example.com/http.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html
答案 2 :(得分:0)
您可以像这样使用awk
cat file:
http://example.com
http://google.com
my.tt.com
t.foo.bar
http://foobar.com
http://example.com/somett.html
http://example.com/http.html
http://example.com/notete.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html
awk -F"http:" '$NF~/tt/'
my.tt.com
http://example.com/somett.html
http://example.com/http.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html
答案 3 :(得分:0)
egrep -c 'http://[^ ?]*tt' YourFile
grep -E
)模式,允许排除搜索条件的http部分答案 4 :(得分:0)
你可以使用grep -v来排除具有这种模式的行
grep tt myfile | grep -v http | wc -l
首先,给出带有“tt”的行,然后将那些带有“http”的行排除,然后计算它。