美丽的汤解析Python

时间:2017-12-30 20:39:06

标签: python html web-scraping beautifulsoup

我已使用BS4抓取了以下html,但似乎无法搜索艺术家代码。 我已将此代码块分配给名为容器的变量,然后尝试

print container.tr.td["artist"]
没有运气。 有什么建议值得赞赏吗?

<tr class="item">
  <!-- <td class="image"><a href="https://www.stargreen.com/kool-as-the-gang-44415.html" title="KOOL AS THE GANG " class="product-image"><img src="https://www.stargreen.com/media/catalog/product/cache/1/small_image/135x/9df78eab33525d08d6e5fb8d27136e95/K/o/KoolAsTheGang.jpg" width="135" height="135" alt="KOOL AS THE GANG " /></a></td> -->
  <td class="date">Sat, 30 Dec 2017</td>
  <td class="artist">kool as the gang</td>
  <td class="venue">100 club</td>
  <td class="link">
  <p class="availability out-of-stock">
    <span>Off Sale</span></p>
  </td>
</tr>

2 个答案:

答案 0 :(得分:5)

你的语法错了,&#34;艺术家&#34;是&#34;类&#34;的价值。属性试试这个:

[WARNING]: Consider using yum module rather than running yum

输出:

from bs4 import BeautifulSoup

html = """
<tr class="item">
<!-- <td class="image"><a href="https://www.stargreen.com/kool-as-the-gang-44415.html" title="KOOL AS THE GANG " class="product-image"><img src="https://www.stargreen.com/media/catalog/product/cache/1/small_image/135x/9df78eab33525d08d6e5fb8d27136e95/K/o/KoolAsTheGang.jpg" width="135" height="135" alt="KOOL AS THE GANG " /></a></td> -->
<td class="date">Sat, 30 Dec 2017</td>
<td class="artist">
                        kool as the gang                     </td>
<td class="venue">100 club</td>
<td class="link">
<p class="availability out-of-stock">
<span>Off Sale</span></p>
</td>
</tr>
"""

soup = BeautifulSoup(html, 'html.parser')
td = soup.find('td',{'class': 'artist'})
print (td.text.strip())

答案 1 :(得分:2)

另一种方式。

使用container方法查找class select为'艺术家'的text元素。由于可能有多个,但您知道只有一个,请选择列表中唯一的元素,并请求其>>> HTML = open('sven.htm').read() >>> import bs4 >>> container = bs4.BeautifulSoup(HTML, 'lxml') >>> container.select('.artist')[0].text '\n kool as the gang ' 属性。

curl -H "Content-Type: application/json" -X POST -d '{"fieldOne": 9000, "fieldTwo": 5}' http://localhost:8000/foos