Question

我在使用Python和BeautifulSoup4时遇到了一个有趣的问题。我的方法通过给定的餐厅（字母键）获取当天学生餐馆的菜单，然后显示这些菜单。

def fetchFood(restaurant):
  # Restaurant id's
  restaurants = {'assari': 'restaurant_aghtdXJraW5hdHIaCxISX1Jlc3RhdXJhbnRNb2RlbFYzGMG4Agw', 'delica': 'restaurant_aghtdXJraW5hdHIaCxISX1Jlc3RhdXJhbnRNb2RlbFYzGPnPAgw', 'ict': 'restaurant_aghtdXJraW5hdHIaCxISX1Jlc3RhdXJhbnRNb2RlbFYzGPnMAww', 'mikro': 'restaurant_aghtdXJraW5hdHIaCxISX1Jlc3RhdXJhbnRNb2RlbFYzGOqBAgw', 'tottisalmi': 'restaurant_aghtdXJraW5hdHIaCxISX1Jlc3RhdXJhbnRNb2RlbFYzGMK7AQw'}

if restaurants.has_key(restaurant.lower()):
  soup = BeautifulSoup(urllib.urlopen("http://murkinat.appspot.com"))
  meal_div = soupie.find(id="%s" % restaurants[restaurant.lower()]).find_all("td", "mealName hyphenate")
  mealstring = "%s: " % restaurant
  for meal in meal_div:
    mealstring += "%s / " % meal.string.strip()
  mealstring = "%s @ %s" % (mealstring[:-3], "http://murkinat.appspot.com")
return mealstring

else:
  return "Restaurant not found"

它将成为我的IRCBot的一部分，但目前它仅适用于我的测试机器（使用Python 2.7.3的Ubuntu 12.04），但在运行机器人的另一台机器上（Xubuntu with Python 2.6.5），它失败了。

行后

soup = BeautifulSoup(urllib.urlopen("http://murkinat.appspot.com"))

>>> type(soup)
<class 'bs4.BeautifulSoup'>

我可以打印它，它显示了所有应该是的内容，但它可以找到任何东西。如果我这样做：

>>> print soup.find(True)
None

>>> soup.get_text()
u'?xml version="1.0" encoding="utf-8" ?'

它停止读到第一行，虽然在另一台机器上，它完美地读取了所有内容。

输出应该是这样的（来自工作机器的餐厅参数“Tottisalmi”在这个日期）：

    Tottisalmi: Sveitsinleike, kermaperunat / Jauheliha-perunamusaka / Uuniperuna, kylmäsavulohitäytettä / Kermainen herkkusienikastike @ http://murkinat.appspot.com

我对这完全无能为力。我有很多类似的BeautifulSoup解析方法，可以在机器人上运行得很好（它解析网址和维基百科的东西），但是这个一直在困扰我。

有没有人有任何想法？我只能想出它与我的Python版本有关，听起来很奇怪，因为在其他地方，BeautifulSoup4工作正常。

Answer 1

我相信你有different parsers installed on the two machines。 html5lib解析器在给定标记上失败，从而产生不良行为。 lxml和html.parser解析器正确解析标记，并且不会给出不良行为。

编写将在多台计算机上运行的代码时，最好明确说明要使用的解析器：

BeautifulSoup(data, "lxml")

这样，如果未安装适当的解析器，您将收到错误。

BeautifulSoup在两种环境下的工作方式不同

1 个答案: