使用BeautifulSoup提取<span> WITH标记</span>

时间:2015-04-02 19:20:20

标签: python beautifulsoup

如何使用<span>标记正确提取<br/>的值?

from bs4 import BeautifulSoup

html_text = '<span id="spamANDeggs">This is<br/>what<br/>I want. WITH the <br/> tags.</span>'

soup = BeautifulSoup(html_text)

text_wanted = soup.find('span',{'id':'spamANDeggs'}).GetText(including<br/>...)

1 个答案:

答案 0 :(得分:4)

您可以像这样使用decode_contents()方法:

from bs4 import BeautifulSoup

html_text = '<span id="spamANDeggs">This is<br/>what<br/>I want. WITH the <br/> tags.</span>'
soup = BeautifulSoup(html_text)
text_wanted = soup.find('span', {'id': 'spamANDeggs'}).decode_contents(formatter="html")

现在text_wanted等于"This is<br/>what<br/>I want. WITH the <br/> tags."