BeautifulSoup只返回第一个结果

时间:2016-06-19 10:53:48

标签: python-3.x beautifulsoup

我正在从网站检索数据并将其写入tsv文件。但是,我的代码只返回第一组而不是整个集。 请帮忙。

BASE_URL = "http://www.parliament.go.ke/index.php/the-national-assembly/house-business/hansard"

#Read base_url into Beautiful soup Object
html = urllib.request.urlopen(BASE_URL).read()
soup = BeautifulSoup(html, "html.parser")

#grab <div class="itemList"> that hold links and dates to all hansard pdfs
hansards = soup.find_all("div","itemList")


#Get all hansards 
#write to a tsv file
with open("hansards.tsv","wt") as f:
    fieldnames = ("date","hansard_url")
    output = csv.writer(f, delimiter="\t")



    for div in hansards:
        hansard_link = [BASE_URL + div.a["href"]]
        hansard_date = soup.find("h3", "catItemTitle").string

        output.writerow([hansard_date,hansard_link])
        print(hansard_date)
        print(hansard_link)

print ("Done Writing File")

1 个答案:

答案 0 :(得分:0)

使用了错误的DIV。应该是:

#grab <div class="itemList"> that hold links and dates to all hansard pdfs
hansards = soup.find_all("div","itemContainer")

for循环应该是:

for div in hansards:
        hansard_link = [BASE_URL + div.a["href"]]
        hansard_date = div.find("h3", "catItemTitle").string

谢谢!

相关问题