Python:BeautifulSoup从div部分提取所有span clases

时间:2017-12-20 18:20:20

标签: python web-scraping beautifulsoup

from requests import get
from bs4 import BeautifulSoup

url = 'https://www.ceda.com.au/Events/Upcoming-events'

response = get(url)

events_container = html_soup.find_all('div', class_ = 'list-bx')

event1name = events_container[0]

print(event1name.a.text)

Eventdate = html_soup.find('div', class_ = ' col-md-4 col-sm-4 side-box well 
side-boxTop')

x = Eventdate.div.text
print(x)

我试图在班级上打印第二个班级班级" col-md-4 col-sm-4侧盒好 侧boxTop"但是,由于每个span类没有唯一的span名称,因此我无法打印第二个span类(第二个P标记(事件日期))

2 个答案:

答案 0 :(得分:1)

试试这个。它会告诉你你的日期:

from requests import get
from bs4 import BeautifulSoup

res = get('https://www.ceda.com.au/Events/Upcoming-events')
soup = BeautifulSoup(res.text,"lxml")
item_date = '\n'.join([' '.join(item.find_parent().select("span")[0].text.split()) for item in soup.select(".side-list .icon-calendar")])
print(item_date)

部分输出:

24/01/2018
30/01/2018
31/01/2018
31/01/2018

答案 1 :(得分:1)

from requests import get
from bs4 import BeautifulSoup
url = 'https://www.ceda.com.au/Events/Upcoming-events'
response = get(url)
html_soup=BeautifulSoup(response.content,"lxml")

events_container = html_soup.find_all('div', class_ = 'list-bx')

event1name = events_container[0]

print(event1name.a.text)

Eventdate = html_soup.find('div', class_ = ' col-md-4 col-sm-4 side-box well side-boxTop')
date=Eventdate.find_all("p")[1].text
print(date)

您也可以将find_all应用于父级,这样您就可以使用find_all并导航到您想要的任何节点。

现在您只需通过textManipulation编辑日期。