美丽的汤4,findAll

时间:2018-12-06 12:05:43

标签: web-scraping beautifulsoup

我的代码是这个

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url=https://www.chembid.com/results/?q=124-07-2&sort=price
my_url='https://www.chembid.com/results/?q=124-07-2&sort=price'

# opening up connection grapping the page
uClient=uReq(my_url)
page_html=uClient.read()
uClient.close()

#html parser
page_soup=soup(page_html,"html.parser")


for Container in Containers:
        name=Container.div.div.span

        title_container=Container.findAll("a",{"class":"supplier"})
        supplier=title_container[0].text

我现在想做的就是使用bs4查找全部

>>> cas_no=Container.findAll("span",{"class":"regular-small-regular-small-font block"})

此代码中

                          工厂供应高质量的99%min辛酸/辛酸CAS 124-07-2,用于制造染料,药物,香料                                                                                            Verifizierter Anbieter->                                                                                                                                                                                                ->                      山东宝维能源科技有限公司                  中国         CAS号:124-07-2         质量/等级:农业级,电子级,食品级,工业级,医学级,试剂级         www.alibaba.com                   $ 0.25-3.68         每公斤,离岸价                               显示报价              

我要寻找的是名称,供应商,Cas-no,质量和价格。

谢谢

1 个答案:

答案 0 :(得分:0)

所以我首先看到的是您尝试遍历Containers对象,但从未将其存储为任何东西。因此,您需要先进行存储,然后再进行迭代。

希望有人会发布一个更强大的解决方案,但是就输出内容和您要输出的内容而言,这将从特定页面中获取。有一些不存在的部分,因此我不得不考虑这些部分,如果它们不存在,则为空。尽管如此,这应该可以帮助您:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import pandas as pd

results = pd.DataFrame()

my_url='https://www.chembid.com/results/?q=124-07-2&sort=price'

# opening up connection grapping the page
uClient=uReq(my_url)
page_html=uClient.read()
uClient.close()

#html parser
page_soup=soup(page_html,"html.parser")

containers = page_soup.find_all('div', {'class':"result-horizontal-wrapper"})


for container in containers:

    name = container.div.div.span.text
    if container.find('a' , {'class':'supplier'}):
        supplier = container.find('a' , {'class':'supplier'}).text
    else:
        supplier = 'n/a'

    span_cas_qulity = container.find_all('span', {'class':'regular-small-font block'})

    cas_no = [x.text for x in span_cas_qulity if 'CAS' in x.text]
    quality = [x.text for x in span_cas_qulity if 'Quality/Grade' in x.text]

    if cas_no != []:
        cas_no = cas_no[0]
    else:
        cas_no = None

    if quality != []:
        quality = quality[0]
    else:
        quality = None

    span_price = container.select('span.black-bold-font-big')[0].text
    span_rate = container.select('span.block.regular-small-font.price')[0].text

    temp_df = pd.DataFrame([[name, supplier, cas_no, quality, span_price, span_rate]], columns = ['name','supplier','cas_no','quality','price','rate'])

    results = results.append(temp_df).reset_index(drop = True)