我一直试图从网页上抓取位于着陆页中的书名以及 customers's choice 的书名。要获得所有书籍的书名,必须像上图所示一样点击向右箭头按钮。
我尝试过:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
links = [
"https://www.amazon.com/Keto-Meal-Prep-Cookbook-Beginners/dp/1673455980/",
"https://www.amazon.com/Keto-Diet-Cookbook-Beginners-Recipes/dp/1792145454/"
]
def fetch_content(link):
driver.get(link)
title = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,'h1#title > span#productTitle'))).text
page_count = wait.until(EC.presence_of_element_located((By.XPATH,'//*[contains(@class,"a-carousel-header-row")][.//h2[contains(@class,"a-carousel-heading")][contains(.,"Customers who")]]//span[@class="a-carousel-page-max"]'))).text
title_list = []
for i in range(int(page_count)+1):
wait.until(EC.presence_of_element_located((By.XPATH,'//*[contains(@class,"a-carousel-header-row")][.//h2[contains(@class,"a-carousel-heading")][contains(.,"Customers who")]]/following-sibling::*[contains(@class,"a-carousel-row")]//a[contains(@class,"a-carousel-goto-nextpage")]'))).click()
for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"li.a-carousel-card > a.a-link-normal > div[data-rows]"))):
title_list.append(item.text)
return title,title_list
if __name__ == '__main__':
with webdriver.Chrome() as driver:
wait = WebDriverWait(driver,15)
for link in links:
print(fetch_content(link))
当我执行上述脚本时,我可以注意到(如果我在脚本运行时手动向下滚动一点)它会从 Customers who viewed
容器中获取前两个标题,然后抛出 stale element reference
错误指向title_list.append(item.text)
。
如何从网页上抓取主书的书名以及客户浏览的书名?