BeautifulSoup解析'findAll'运行错误

时间:2015-01-27 20:58:43

标签: python beautifulsoup mechanize

我正在尝试使用mechanize和BeautifulSoup解析一个网站而没有任何运气,我知道可以访问网站表,因为我可以阅读并打印整个页面...用户代理未在此处发布。

html = page.read()
soup = BeautifulSoup(html)
table = soup.find("table", id="table-hover")

for row in table.findAll('tr')[1:]:
    col = row.findAll('th')
    time = col[0].string
    ais_source = col[1].string
    speed_km = col[2].string
    lat = col[3].string
    lon = col[4].string
    course = col[5].string
    record = ( time, ais_source, speed_km, lat, lon, course )
    print "|".join(record)

当我运行此代码时,我收到错误" NoneType对象没有属性' findAll'我无法找到该页面的唯一表格标识符。

1 个答案:

答案 0 :(得分:1)

您需要提供用户代理:

url = "http://www.marinetraffic.com/en/ais/index/positions/all/shipid:415660/mmsi:354975000/shipname:ADESSA%20OCEAN%20KING/_:6012a2741fdfd2213679de8a23ab60d3"
import requests
headers = {'User-agent': 'Mozilla/5.0'}
html = requests.get(url,headers=headers).content
soup = BeautifulSoup(html)

table = soup.find("table") # only one table

所以只需用以下内容解压缩列表:

for row in table.findAll('tr')[1:]:
    items = row.text.replace(u"kn","") # remove kn so items line up when unpacking
    time, ais_source, speed_km, lat, lon, course = items.split()[1:7]
    print(time,ais_source,speed_km,lat,lon,course)


(u'21:40', u'T-AIS', u'0', u'6.422732', u'3.406325', u'327')
(u'21:17', u'T-AIS', u'0.1', u'6.42272', u'3.406313', u'311')
(u'20:53', u'T-AIS', u'0', u'6.422688', u'3.406312', u'321')
(u'20:30', u'T-AIS', u'0', u'6.422668', u'3.4063', u'324')
(u'20:07', u'T-AIS', u'0.1', u'6.42266', u'3.406287', u'323')
(u'19:44', u'T-AIS', u'0', u'6.422685', u'3.406273', u'320')
(u'19:20', u'T-AIS', u'0.1', u'6.422687', u'3.406297', u'316')
(u'18:57', u'T-AIS', u'0.1', u'6.422675', u'3.406292', u'308')
(u'18:34', u'T-AIS', u'0.1', u'6.422658', u'3.406327', u'312')
(u'18:10', u'T-AIS', u'0.1', u'6.422723', u'3.406318', u'317')

没有它你会收到403错误:

<html><body><h1>403 Forbidden</h1>
Request forbidden by administrative rules.
</body></html>
相关问题