Beautifulsoup如何从CSS选择器获取所有div ID值

时间:2020-01-22 16:36:48

标签: html python-3.x selenium beautifulsoup

我正在尝试从网站获取所有ID匹配项。首先,我下载了该表格,并说明如何给print标签添加我刚刚下载的所有内容,但是当我尝试获取div.id值时,我得到了Non; ( 我的意思是这里包含的ID:

    <div class = "event__match event__match - last event__match - oneLine" id = "g_1_ARFva552" title = "Click for match detail!">

如果有人可以帮助我,我该如何下载所有匹配ID ...

这是我的代码:

browser.get("https://www.flashscore.com/football/")
sleep(3)
source = browser.page_source # Get the entire page source from the browser
if browser is not None :browser.close() # No need for the browser so close it 
soup = BeautifulSoup(source,'html.parser')
try:
    Tags = soup.select('div.leagues--live') # get the elements using css selectors
    print(Tags)
    for tag in Tags: # loop through them 
        matchId = tag.find('div').get('id')
        print (matchId)


except Exception as e:
    print(e)

预先感谢您的帮助

1 个答案:

答案 0 :(得分:1)

如果您正在使用硒和bs4,则诱导WebDriverWait并等待visibility_of_element_located()而不是sleep()

使用以下css选择器返回所有具有id属性的div元素。

代码:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup

browser=webdriver.Chrome()
browser.get("https://www.flashscore.com/football/")
WebDriverWait(browser,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"div.leagues--live")))
source = browser.page_source # Get the entire page source from the browser
if browser is not None :browser.close() # No need for the browser so close it
soup = BeautifulSoup(source,'html.parser')
try:
    Tags = soup.select("div.leagues--live div[title='Click for match detail!'][id]") # get the elements using css selectors
    for tag in Tags: # loop through them
        print (tag['id'])
except Exception as e:
    print(e)

输出

g_1_tlPhaQm9
g_1_Cx3yi2ek
g_1_G6H5dOXR
g_1_dh16mtAI
g_1_8WUO5NPn
g_1_tlkj9gx4
g_1_fH8eMl74
g_1_l4weOAxh
g_1_2sC3KSyH
g_1_MVOy2KLk
g_1_K4aSodm5
g_1_MDNDnZxN
g_1_ptl2EDRi
g_1_v3aeymC2
g_1_t6GdSgqn
g_1_bsB1RDbh
g_1_xY95QXDb
g_1_Wf99PiT4
...so on
相关问题