Question

我正在尝试使用selenium从下面的网络链接中刮取表格。但是，pandas似乎只返回第一个表，而不是所有表。

weblink = 'http://sgx.com/wps/portal/sgxweb/home/company_disclosure/stockfacts?page=2&code=A68U&lang=en-us'
path_to_chromedriver = r'C:\chromedriver.exe'
driver = webdriver.Chrome(executable_path=path_to_chromedriver)
driver.get(weblink)
wait = WebDriverWait(driver, 8)
# locate and switch to the iframe
iframe = driver.find_element_by_css_selector("#mainContent iframe")
driver.switch_to.frame(iframe)

wait.until(EC.visibility_of_element_located((By.ID, 'financials')))  # Should I be using this?

print(pandas.read_html(driver.page_source, flavor='bs4'))

如何让pandas打印出所有表而不是第一张？

Answer 1

您是否查看了返回的TABLE的内容？它实际上包含所有5个“表”。在页面上看起来像单独的TABLE标签实际上只有一个和TBODY的格式看起来像单独的TABLE。您应该熟悉浏览器的开发控制台。我会推荐Chrome的。右键单击表格内的元素，然后从上下文菜单中选择“检查”。现在将鼠标悬停在开发控制台中的元素上，浏览器将突出显示网页上的元素。这是将HTML中的元素与页面上的元素相关联的好方法。如果您在此页面上执行此操作，则会看到只有一个TABLE代码，而每个TBODY看起来都是单独的TABLE。

当发现5个表时，Pandas读取html只返回1个表

1 个答案: