美丽的汤PYTHON - 里面标记

时间:2017-12-22 10:14:05

标签: python beautifulsoup

BeautifulSoup的小问题:

from bs4 import BeautifulSoup
import requests

link = "http://www.cnnvd.org.cn/web/vulnerability/querylist.tag"

req = requests.get(link)
web = req.text
soup = BeautifulSoup(web, "lxml")

cve_name = []
cve_link = []

for par_ in soup.find_all('div', attrs={'class':'fl'}):
    for link_ in par_.find_all('p'):
        for text_ in link_.find_all('a'):
            print (text_.string)
            print (text_['href'])
            print ("==========")
            #cve_name.append(text_.string)
            #cve_link.append(text_['href'])

它给了我两次记录:V这可能很容易解决:V

2 个答案:

答案 0 :(得分:1)

相同的元素位于页面的两个位置,因此您必须使用find() / find_all()仅选择一个位置,例如{/ 1}}

find(class_='list_list')

完整代码:

soup.find(class_='list_list').find_all('div', attrs={'class':'fl'}):

答案 1 :(得分:0)

这个怎么样?我使用css选择器来做同样的事情。

from bs4 import BeautifulSoup
from urllib.parse import urljoin
import requests

link = "http://www.cnnvd.org.cn/web/vulnerability/querylist.tag"
res = requests.get(link)
soup = BeautifulSoup(res.text, "lxml")

for item in soup.select('.fl p a'):
    print("Item: {}\nItem_link: {}".format(item.text,urljoin(link,item['href'])))

部分输出:

Item: CNNVD-201712-811
Item_link: http://www.cnnvd.org.cn/web/xxk/ldxqById.tag?CNNVD=CNNVD-201712-811
Item: CNNVD-201712-810
Item_link: http://www.cnnvd.org.cn/web/xxk/ldxqById.tag?CNNVD=CNNVD-201712-810
Item: CNNVD-201712-809
Item_link: http://www.cnnvd.org.cn/web/xxk/ldxqById.tag?CNNVD=CNNVD-201712-809