Question

我从页面开始：

https://mysite/a"

我想抓住页面获取任何嵌套网址的完整网址，这些网址以相同的词干开头（如https://mysite/a/b）。

我试过了：

$ wget -r --spider --accept-regex "https://...*" 'https://.../' 2>test.txt

产生大量输出的

包含了我想要的网址：

--2018-04-21 15:04:48--  https:/mysite/a/
Reusing existing connection to mysite:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'a/index.html.tmp.tmp'

如何打印出网址列表？

编辑：

将其更改为

$ wget -r --spider  'https://mysite/a/' |grep 'https://mysite/a*' 2>test.txt

作为测试。 test.txt中没有保存输出。该文件为空。

使用正则表达式获取带有wget的uls列表

0 个答案: