解析HTML与美丽的汤

时间:2016-01-24 07:43:09

标签: python web-scraping beautifulsoup

我正在试图弄清楚如何使用美丽的汤,并且我很难过。

我的HTML页面有几个如下所示的元素:

<a class="propertyName" href="/preferredguest/property/overview/index.html?propertyID=1023"><span>The Westin Peachtree Plaza, Atlanta
</span></a>

<a class="propertyName" href="/preferredguest/property/overview/index.html?propertyID=1144"><span>Sheraton Atlanta Hotel
</span></a>

我正在尝试使用酒店名称创建一个数组。到目前为止,这是我的代码:

import requests
from bs4 import BeautifulSoup

url = "removed"
response = requests.get(url)
soup = BeautifulSoup(response.text)

hotels = soup.find_all('a', class_="propertyName")

但我无法弄清楚如何迭代酒店数组以显示span元素。

1 个答案:

答案 0 :(得分:2)

您的&#34;酒店&#34;名称在span内。一种方法是使用.select()方法

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('''<a class="propertyName" href="/preferredguest/property/overview/index.html?propertyID=1023"><span>The Westin Peachtree Plaza, Atlanta
... </span></a>
... 
... <a class="propertyName" href="/preferredguest/property/overview/index.html?propertyID=1144"><span>Sheraton Atlanta Hotel
... </span></a>
... ''', 'lxml')
>>> [element.get_text(strip=True) for element in soup.select('a.propertyName > span')]
['The Westin Peachtree Plaza, Atlanta', 'Sheraton Atlanta Hotel']
>>> 

>>> names = []
>>> for el in hotels:
...     names.append(el.find('span').get_text(strip=True))
... 
>>> names
['The Westin Peachtree Plaza, Atlanta', 'Sheraton Atlanta Hotel']
>>>