如何从此标记中获取所有文本?

时间:2016-05-20 01:00:00

标签: python html beautifulsoup html-parsing

我试图从此HTML标记中获取所有文本,我将其存储在变量tag中:

<td rowspan="2" style="text-align: center;"><a href="/wiki/Glenn_Miller" title="Glenn Miller">Glenn Miller</a> &amp; His Orchestra</td>

结果应为"Glenn Miller & His Orchestra"

print ing tag.find(text=True)会返回此信息:"Glenn Miller"

如何在td元素中获取其余文本?

1 个答案:

答案 0 :(得分:4)

tag.find(text=True)将返回第一个匹配的文本节点。请改用.get_text()

>>> from bs4 import BeautifulSoup
>>> data = '<td rowspan="2" style="text-align: center;"><a href="/wiki/Glenn_Miller" title="Glenn Miller">Glenn Miller</a> &amp; His Orchestra</td>'
>>> soup = BeautifulSoup(data, "html.parser")
>>> tag = soup.td
>>> tag.get_text()
'Glenn Miller & His Orchestra'