Question

虽然我似乎不是第一个遇到这个问题的人，但我无法找到问题的答案。

我正在抓取一个HTML表格，虽然我试图遍历它，但我只是从表中获取第一行。

import requests
from bs4 import BeautifulSoup



# Webpage connection
html = "https://www.wegochem.com/chemicals/organic-intermediates/supplier-distributor"
r=requests.get(html)
c=r.content
soup=BeautifulSoup(c,"html.parser")
# Grab title-artist classes and store in recordList

wegoList = soup.find_all("tbody")

try:
    for items in wegoList:
        material = items.find("td", {"class": "click_whole_cell",}).get_text().strip()

        cas = items.find("td", {"class": "text-center",}).get_text().strip()

        category = items.find("div", {"class": "text-content short-text",}).get_text().strip()

    print(material,cas,category)
except:
    pass

第一行的结果是正确的：（1,2-二甲基咪唑1739-84-0有机中间体，塑料，树脂和橡胶，涂料）; 但是for循环没有循环遍历表。

感谢您的帮助

Answer 1

for items in wegoList:循环遍历tbody列表，然后您尝试从整个表中提取属性，但是您应该遍历每个tr行：

wegoList = soup.find_all("tbody")

try:
    soup=BeautifulSoup(wegoList.__str__(),"html.parser")
    trs = soup.find_all('tr') #Makes list of rows

    for tr in trs: 
        material = tr.find("td", {"class": "click_whole_cell",}).get_text().strip()

        cas = tr.find("td", {"class": "text-center",}).get_text().strip()

        category = tr.find("div", {"class": "text-content short-text",}).get_text().strip()

    print(material,cas,category)

Answer 2

试试这段代码：

imp

更新的代码：

import requests
from bs4 import BeautifulSoup



# Webpage connection
html = "https://www.wegochem.com/chemicals/organic-intermediates/supplier-distributor"
r=requests.get(html)
c=r.content
soup=BeautifulSoup(c,"html.parser")
# Grab title-artist classes and store in recordList

wegoList = soup.find_all("tbody")

try:
    for items in wegoList:
        material = items.find_all("td", {"class": "click_whole_cell",})
        for i in material:
            print(i.get_text().strip())

        cas = items.find_all("td", {"class": "text-center",})
        for i in cas:
            print(i.get_text().strip())

        category = items.find_all("div", {"class": "text-content short-text",})
        for i in category:
            print(i.get_text().strip())

except:
    pass

输出：

import requests
from bs4 import BeautifulSoup



# Webpage connection
html = "https://www.wegochem.com/chemicals/organic-intermediates/supplier-distributor"
r=requests.get(html)
c=r.content
soup=BeautifulSoup(c,"html.parser")
# Grab title-artist classes and store in recordList

wegoList = soup.find_all("tbody")


for items in wegoList:
    material = items.find_all("td", {"class": "click_whole_cell",})
    cas = items.find_all("td", {"class": "text-center",})
    category = items.find_all("div", {"class": "text-content short-text",})
    for i in zip(material,cas,category):
        print(i[0].get_text().strip(),i[1].get_text().strip(),i[2].get_text().strip())

Beautifulsoup只返回第一项

2 个答案: