Question

我正在尝试抓取所有公司的名称listed on this site。每页（共14页）显示80家公司的名称。每个URL的末尾都有 start = 241＆count = 80＆first = 2009＆last = 2018 ，其中start是页面的第一行。我正在尝试遍历每80家公司，这些公司将遍历每个页面，并刮擦公司名称。但是，每次尝试时，我都会在循环中第二次收到此错误：

File "beautiful_soup_2.py", line 10, in <module>
name_table = (soup.findAll('table')[4])
File "C:\Users\adamm\Downloads\Python\lib\site-packages\bs4\element.py", line 1807, in __getattr__
"ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
AttributeError: ResultSet object has no attribute 'findAll'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

但是，如果我删除列表并手动输入 start = 81、161、241 等的URL，结果将返回页面上的公司列表。

到目前为止，我的代码：

from urllib.request import urlopen
from bs4 import BeautifulSoup as soup

for x in range(1,1042,80):
sauce = ('https://www.sec.gov/cgi-bin/srch-edgar?text=form-type%20%3D%2010-12b%20OR%20form-type%3D10-12b%2Fa&start={}&count=80&first=2009&last=2018'.format(x))

source_link = urlopen(sauce).read()
soup = soup(source_link, 'lxml')

name_table = (soup.findAll('table')[4])
table_rows = name_table.findAll('tr')

for row in table_rows:
    cols = row.findAll('td')
    cols = [x.text.strip() for x in cols]
    print(cols)

这让我发疯，因此，非常感谢您的帮助。

需要帮助通过精美的汤循环访问URL

0 个答案: