我有URL列表,我需要在其中获取保存在另一个列表中的页面标题。 wget或curl似乎是正确的方法,但我不知道具体如何。你能帮我吗?谢谢
答案 0 :(得分:0)
你的意思是那样的吗?
wget_title_from_filelist.sh
#!/bin/bash
while read -r URL; do
echo -n "$URL --> "
wget -q -O - "$URL" | \
tr "\n" " " | \
sed 's|.*<title>\([^<]*\).*</head>.*|\1|;s|^\s*||;s|\s*$||'
echo
done
filelist.txt
https://stackoverflow.com
https://cnn.com
https://reddit.com
https://archive.org
用法
./wget_title_from_filelist.sh < filelist.txt
输出
https://stackoverflow.com --> Stack Overflow - Where Developers Learn, Share, & Build Careers
https://cnn.com --> CNN International - Breaking News, US News, World News and Video
https://reddit.com --> reddit: the front page of the internet
https://archive.org --> Internet Archive: Digital Library of Free & Borrowable Books, Movies, Music & Wayback Machine
说明
tr "\n" " " # remove \n, create one line of input for sed
sed 's|.*<title>\([^<]*\).*</head>.*|\1|; # find <title> in <head>
s|^\s*||; # remove leading spaces
s|\s*$||' # remove trailing spaces