需要帮助通过精美的汤循环访问URL

时间:2018-07-13 04:03:14

标签: python-3.x loops web-scraping beautifulsoup

我正在尝试抓取所有公司的名称listed on this site。每页(共14页)显示80家公司的名称。每个URL的末尾都有 start = 241&count = 80&first = 2009&last = 2018 ,其中start是页面的第一行。我正在尝试遍历每80家公司,这些公司将遍历每个页面,并刮擦公司名称。但是,每次尝试时,我都会在循环中第二次收到此错误:

File "beautiful_soup_2.py", line 10, in <module>
name_table = (soup.findAll('table')[4])
File "C:\Users\adamm\Downloads\Python\lib\site-packages\bs4\element.py", line 1807, in __getattr__
"ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
AttributeError: ResultSet object has no attribute 'findAll'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

但是,如果我删除列表并手动输入 start = 81、161、241 等的URL,结果将返回页面上的公司列表。

到目前为止,我的代码:

from urllib.request import urlopen
from bs4 import BeautifulSoup as soup

for x in range(1,1042,80):
sauce = ('https://www.sec.gov/cgi-bin/srch-edgar?text=form-type%20%3D%2010-12b%20OR%20form-type%3D10-12b%2Fa&start={}&count=80&first=2009&last=2018'.format(x))

source_link = urlopen(sauce).read()
soup = soup(source_link, 'lxml')

name_table = (soup.findAll('table')[4])
table_rows = name_table.findAll('tr')

for row in table_rows:
    cols = row.findAll('td')
    cols = [x.text.strip() for x in cols]
    print(cols)

这让我发疯,因此,非常感谢您的帮助。

0 个答案:

没有答案