This is the layout of the webpage:
<h2>Featured Ads</h2>
<a href=""></a>
<h2>Ads</h2>
<a href=""></a>
There is nothing in the class
of the regular Ads that I can use to differentiate them. What would be an efficient way to only return the <a href>
's that appear after <h2>Ads</h2>
?
Update:
Here's the final code
h2 = soup.find("h2", text="Ads")
articles = h2.find_next_siblings("article")
for article in articles:
for div in article.find_all('div', {'class': 'address'}):
for link in div.find_all('a', href=True):
print (link['href'])
Update 2: had to refactor...
articles = soup.find("h2", text="Ads").find_next_siblings("article")
for article in articles:
ad_url = article.find('a', href=True)['href']
答案 0 :(得分:2)
找到h2
元素和find the next a
sibling:
h2 = soup.find("h2", text="Ads")
a = h2.find_next_sibling("a")