无法从网页上抓取主要书籍的标题以及客户查看的书籍

时间:2021-04-29 20:06:18

标签: python python-3.x selenium selenium-webdriver web-scraping

我一直试图从网页上抓取位于着陆页中的书名以及 customers's choice 的书名。要获得所有书籍的书名,必须像上图所示一样点击向右箭头按钮。

我尝试过:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

links = [
    "https://www.amazon.com/Keto-Meal-Prep-Cookbook-Beginners/dp/1673455980/",
    "https://www.amazon.com/Keto-Diet-Cookbook-Beginners-Recipes/dp/1792145454/"
]

def fetch_content(link):
    driver.get(link)
    title = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,'h1#title > span#productTitle'))).text
    page_count = wait.until(EC.presence_of_element_located((By.XPATH,'//*[contains(@class,"a-carousel-header-row")][.//h2[contains(@class,"a-carousel-heading")][contains(.,"Customers who")]]//span[@class="a-carousel-page-max"]'))).text

    title_list = []
    for i in range(int(page_count)+1):
        wait.until(EC.presence_of_element_located((By.XPATH,'//*[contains(@class,"a-carousel-header-row")][.//h2[contains(@class,"a-carousel-heading")][contains(.,"Customers who")]]/following-sibling::*[contains(@class,"a-carousel-row")]//a[contains(@class,"a-carousel-goto-nextpage")]'))).click()
        for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"li.a-carousel-card > a.a-link-normal > div[data-rows]"))):
            title_list.append(item.text)
    return title,title_list

if __name__ == '__main__':
    with webdriver.Chrome() as driver:
        wait = WebDriverWait(driver,15)
        for link in links:
            print(fetch_content(link))

当我执行上述脚本时,我可以注意到(如果我在脚本运行时手动向下滚动一点)它会从 Customers who viewed 容器中获取前两个标题,然后抛出 stale element reference 错误指向title_list.append(item.text)

<块引用>

如何从网页上抓取主书的书名以及客户浏览的书名?

0 个答案:

没有答案
相关问题