无论属性如何,BeautifulSoup都按属性值查找

时间:2017-03-06 10:48:19

标签: beautifulsoup

说我有这样的事情:

<div class="cake">1</div>
<h2 id="cake">1</div>
<sometag someattribute="cake">1</div>

我想搜索关键字&#39; cake&#39;并获得所有这些。

2 个答案:

答案 0 :(得分:0)

使用lambda查找所有内容并搜索给定的属性值,或者类是否包含所需的值。

from bs4 import BeautifulSoup

example = """<div class="cake">1</div>
<h2 id="cake">1</div>
<sometag someattribute="cake">1</div>"""

soup = BeautifulSoup(example, "html.parser")

print (soup.find_all(lambda tag: [a for a in tag.attrs.values() if a == "cake" or "cake" in tag.get("class")]))

输出:

[<div class="cake">1</div>, <h2 id="cake">1</h2>, <sometag someattribute="cake">1</sometag>]

答案 1 :(得分:0)

您可以一起使用正则表达式和BeautifulSoup。这是我可怕的剧本:

r = '''<div class="cake">1</div>
<h2 id="cake">1</div>
<sometag someattribute="cake">1</div>'''

import re
from bs4 import BeautifulSoup
soup = BeautifulSoup(r, 'lxml')

for i in range(len(re.findall(r'(\w+)="cake"',str(soup)))-1):
    print(soup.find_all(re.compile(r'(\w+)'), {(re.findall(pattern,str(soup)))[i]:'cake'}))

输出:

[<div class="cake">1</div>]
[<h2 id="cake">1 </div>
<sometag someattribute="cake">1</sometag></h2>]