返回空集的美丽的汤

时间:2013-11-28 19:27:25

标签: python beautifulsoup redhat

Beautiful Soup在本地计算机上正常运行,但在其他服务器上运行不正常。

import urllib2
import bs4

url = urllib2.urlopen("http://www.google.com")
html = url.read()
soup = bs4.BeautifulSoup(html)

print soup

打印Html正确输出谷歌的网页。印刷汤返回空白。

在本地它工作正常,但是在这个redhat机器上它返回空。

这与安装解析器有关吗?我查了一些其他可能的解决方案,他们提到安装解析器,但到目前为止没有运气。

此解决方案Beautiful Soup returning nothing不适用于我的问题

1 个答案:

答案 0 :(得分:0)

只是为了向您证明您的案例是独一无二的,并且与Redhat无关。

我从AWS中推出了一个微型Redhat实例,这是从SSH到全新的redhat机器的完整过程。 enter image description here

(1)这里我在新机器上安装了beautifulsoup4:

$ ssh -i key.pem ec2-user@awsip
The authenticity of host 'awsip' cant be established.
RSA key fingerprint is ....
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'awsip' (RSA) to the list of known hosts.
[ec2-user@awsip ~]$ sudo easy_install beautifulsoup4
Searching for beautifulsoup4
Reading http://pypi.python.org/simple/beautifulsoup4/
...
Installed /usr/lib/python2.6/site-packages/beautifulsoup4-4.3.2-py2.6.egg
Processing dependencies for beautifulsoup4
Finished processing dependencies for beautifulsoup4

(2)我打开了python,并在htmlsoup

中获取谷歌的输出
[ec2-user@awsip ~]$ python
Python 2.6.6 (r266:84292, May 27 2013, 05:35:12)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> from bs4 import BeautifulSoup
>>> html = urllib2.urlopen("http://www.google.com").read()
>>> soup = BeautifulSoup(html)
>>> print html[:100]
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage"><head><meta content="Search t
>>> print soup.prettify()[:100]
<!DOCTYPE html>
<html itemscope="" itemtype="http://schema.org/WebPage">
 <head>
  <meta content="Se

要调试它是urllib2或bs4的错误: 尝试运行此代码:

from bs4 import BeautifulSoup

html = """
<html>
<head>
</head>
<body>
<div id="1">numberone</div>
<div id="2">numbertwo</div>
</body>
</html>
"""

print BeautifulSoup(html).find('div', {"id":"1"})

如果您成功安装了beautifulsoup,您将获得如下所示的预期输出:

<div id="1">numberone</div>
相关问题