得到<a> tag which is in </a> <li> <a>

时间:2017-01-19 15:24:04

标签: python web-scraping beautifulsoup html-parsing

How to get the href of the all the tag that is under the class "Subforum"in the given code?

`<li class="subforum">
 <a href="Link1">Link1 Text</a>
 </li>
 <li class="subforum">
<a href="Link2">Link2 Text</a>
</li>
<li class="subforum">
<a href="Link3">Link3 Text</a>
</li>`

I have tried this code but obviously it didn't work.

`Bs = BeautifulSoup(requests.get(url).text,"lxml")
Class = Bs.findAll('li', {'class': 'subforum"'})
for Sub in Class:
print(Link.get('href'))`

2 个答案:

答案 0 :(得分:9)

href属于a代码,而非li代码,使用li.a获取a代码

文件:STORM_LOCAL_MODE_ZMQ

import bs4

html = '''<li class="subforum">
 <a href="Link1">Link1 Text</a>
 </li>
 <li class="subforum">
<a href="Link2">Link2 Text</a>
</li>
<li class="subforum">
<a href="Link3">Link3 Text</a>
</li>`<br>'''

soup = bs4.BeautifulSoup(html, 'lxml')
for li in soup.find_all(class_="subforum"):
    print(li.a.get('href'))

出:

Link1
Link2
Link3

为什么要使用class_

搜索具有特定CSS类的标记非常有用,但CSS属性的名称class是Python中的保留字。使用类作为关键字参数会给你一个语法错误。从Beautiful Soup 4.1.2开始,你可以使用关键字参数class_来搜索CSS类。

答案 1 :(得分:1)

你几乎就在那里,你只需为你所在的a找到一个li元素:

Class = Bs.findAll('li', {'class': 'subforum"'})
for Sub in Class:
    print(Sub.find("a").get('href'))  # or Sub.a.get('href')

但是,有一种更简单的方法 - CSS selector

for a in Bs.select("li.subforum a"):
    print(a.get('href'))

此处,li.subforum a会匹配a类属性li元素下的所有subforum元素。

作为附注,在BeautifulSoup 4中,findAll()已重命名为find_all()。而且,您应该遵循Python general variable naming guidelines