BeautifulSoup:无法找到表

时间:2016-12-25 01:24:18

标签: python beautifulsoup

我试图从this page中榨取价格。

我需要这张桌子:

table class = "table table-condensed table-info"

但是,当我打印内容并搜索该表时,找不到它:

from BeautifulSoup import BeautifulSoup
import urllib2
from bs4 import BeautifulSoup

url = "https://www.predictit.org/Contract/4393/Will-Obama-pardon-Hillary-Clinton#openoffers"  

page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)
print soup

任何帮助将不胜感激!

2 个答案:

答案 0 :(得分:1)

实际问题 - 价格加载了单独的异步请求到另一个端点。您需要在代码中模拟它:

from bs4 import BeautifulSoup
import requests

url = "https://www.predictit.org/Contract/4393/Will-Obama-pardon-Hillary-Clinton#openoffers"
price_url = "https://www.predictit.org/PrivateData/GetPriceListAjax?contractId=4393"

with requests.Session() as session:
    session.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'}
    session.get(url)  # visit main page

    # request prices
    response = session.get(price_url)
    soup = BeautifulSoup(response.content, "html.parser")
    tables = soup.select("table.table-info")
    for row in tables[0].select("tr")[2:]:
        values = [td.find(text=True, recursive=False) for td in row('td') if td.text]
        print(values)

打印第一个"是"的内容。表(用于演示目的):

[u'13', u'1555', u'12', u'240']
[u'14', u'707', u'11', u'2419']
[u'15', u'2109', u'10', u'3911']
[u'16', u'1079', u'9', u'2634']
[u'17', u'760', u'8', u'2596']
[u'18', u'510', u'7', u'970']
[u'19', u'973', u'6', u'1543']
[u'20', u'483', u'5', u'2151']
[u'21', u'884', u'4', u'1195']
[u'22', u'701', u'3', u'950']

请注意,我们在此处通过requests.Session()维护网络抓取会话。

另请注意,price_url包含contractId GET参数 - 如果您要请求包含价格的其他页面,请务必使用相应的contractId

答案 1 :(得分:0)

您可以使用BeautifulSoup的select功能通过CSS选择器找到元素:

>>> soup = BeautifulSoup(page)
>>> soup.select('table.table.table-condensed.table-info')

[<table class="table table-condensed table-striped table-info">
<tbody>
<tr>
<td>Symbol:</td>
<td>CLINTON.OBAMAPARDON</td>
</tr>
<tr>
<td>Start Date:</td>
<td>11/10/2016</td>
</tr>
<tr>
<td>End Date:</td>
<td>01/20/2017 11:59 PM (ET)</td>
</tr>
<tr>
<td>Shares Traded:</td>
<td>388,522</td>
</tr>
<tr>
<td>Today's Volume:</td>
<td>3,610</td>
</tr>
<tr>
<td>Total Shares:</td>
<td>143,774</td>
</tr>
<tr>
<td>Today's Change:</td>
<td style="color: green">+1<span style="font-family: helvetica;">¢</span> <i class="glyphicons up_arrow green" style="margin-top: 3px;"></i></td>
</tr>
</tbody>
</table>]