Question

某些html代码包含一些dt标记，如下所示：

<dt>PLZ:</dt>
<dd>
8047
</dd>

我想在dd标记后面的dt标记中找到文字PLZ:的文字。根据文档，我正在尝试以下内容：

number = BeautifulSoup(text).find("dt",text="PLZ:").findNextSiblings("dd")

使用text上面的字符串，但我得到的只是一个空列表而不是我要查找的数字（当然是字符串）。也许我误解了文档？

Answer 1

所以试试吧：

from BeautifulSoup import BeautifulSoup

text = """
<dt>PLZ:</dt>
<dd>
8047
</dd>"""

number = BeautifulSoup(text).find("dt",text="PLZ:").parent.findNextSiblings("dd")
print BeautifulSoup(''.join(number[0]))

或者如果你发现了findNext，请尝试：

number = BeautifulSoup(text).find("dt",text="PLZ:").parent.findNext("dd").contents[0]

Answer 2

这对我有用：

from BeautifulSoup import BeautifulSoup

text = '''<dt>PLZ:</dt>
<dd>
8047
</dd>'''


BeautifulSoup(text).find("dt",text="PLZ:").parent.findNextSiblings('dd')

使用beautifulsoup解析标签与一些文本

2 个答案: