如何使用Python从BBC RSS提要中提取所有文章链接?

时间:2017-11-08 13:56:31

标签: python beautifulsoup rss

我试过这个,它似乎没有起作用。我只需要列表中的文章链接。

from urllib import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://feeds.bbci.co.uk/news/entertainment_and_arts/rss.xml")
bsObj = BeautifulSoup(html.read(),"html.parser");

for link in bsObj.find_all('a'):
    print(link.get('href'))

2 个答案:

答案 0 :(得分:0)

即使它在通过浏览器访问时呈现为HTML,服务器也会将XML返回给Python。如果您print(html.read()),您将看到该XML。

在此XML中,<a>代码替换为<link>代码(没有属性),因此您需要更改代码以反映:

from urllib import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://feeds.bbci.co.uk/news/entertainment_and_arts/rss.xml")
bsObj = BeautifulSoup(html.read(),"html.parser");

for link in bsObj.find_all('link'):
    print(link.text)

#  http://www.bbc.co.uk/news/
#  http://www.bbc.co.uk/news/
#  http://www.bbc.co.uk/news/entertainment-arts-41914725
#  http://www.bbc.co.uk/news/entertainment-arts-41886207
#  http://www.bbc.co.uk/news/entertainment-arts-41886475
#  ...
#  ...

答案 1 :(得分:0)

import feedparser
url='http://feeds.bbci.co.uk/news/entertainment_and_arts/rss.xml'
data = feedparser.parse(url)
i=0
while i < len(data):
    print(data['entries'][i]["link"])
    i=i+1