Question

我正在尝试在使用 Beautiful Soup 的网络抓取脚本中运行一个循环来从此 Page 中提取数据。循环将遍历每个 div 标签并提取 4 条不同的信息。它搜索一个 h3、一个 div 和 2 个 span 标签。但是当我添加“.text”选项时，我会收到来自“日期”、“soldprice”和“shippingprice”的错误信息。错误说：

AttributeError: 'NoneType' object has no attribute 'text'

我可以从“标题”中获取文本值，但是当我将“.text”放在行尾或打印函数中时，没有其他内容。整个脚本在运行时会提取正确的信息，但是我不想要 html 标签。

results = soup.find_all("div", {"class": "s-item__info clearfix"}) #to separate the section of text for each item on the page
for item in results:
    product = {
        'title': item.find("h3", attrs={"class": "s-item__title s-item__title--has-tags"}).text,
        'date': item.find("div", attrs={"class": "s-item__title--tag"}), #.find("span", attrs={"class": "POSITIVE"}),
        'soldprice': item.find("span", attrs={"class": "s-item__price"}),
        'shippingprice': item.find("span", attrs={"class": "s-item__shipping s-item__logisticsCost"}),
    }
    print(product)

Answer 1

问题是因为在提供之前还有其他 div 有 class="s-item__info clearfix" 但没有 date, soldprice,shippingprice。

您必须添加 find 才能仅在优惠中搜索

results = soup.find('div', class_='srp-river-results clearfix').find_all("div", {"class": "s-item__info clearfix"})

使用 Beautiful Soup 解析从 <div> 标签中的多个标签中提取文本的循环

1 个答案: