Question

我有这个sript：

import urrlib2
from bs4 import BeautifulSoup
url = "http://www.shoptop.ru/"
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)
divs = soup.findAll('a')
print divs

对于this网站，它会打印空列表吗？有什么问题？我在Ubuntu 12.04上运行

Answer 1

实际上，BeautifulSoup中存在相当多的错误，可能会引发一些未知错误。使用lxml解析器

处理apache时遇到了类似的问题

因此，只需尝试使用documentation

中提到的其他几个解析器

soup = BeautifulSoup(page, "html.parser")

这应该有效！

Answer 2

看起来你的代码中有一些错误urllib2应该是urllib2，我已经修复了你的代码，这可以使用BeautifulSoup 3

import urllib2
from BeautifulSoup import BeautifulSoup
url = "http://www.shoptop.ru/"
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)
divs = soup.findAll('a')
print divs

BeautifulSoup不适用于某些网站

2 个答案: