link that contain the HTML table
这是XML文本
<!DOCTYPE html>
<html>
<head>
<body onkeyup="return key_up(event,'dwhswdf_org')" onload="onLoad()" style="padding: 0px;">
<script type="text/javascript">
<head>
<body>
<div id="dark"></div>
<div id="light"></div>
<div id="wrapper">
<div id="cattext"></div>
<div id="titletext"></div>
<div id="tabstext"></div>
<br>
<table width="1%" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td width="1%" valign="top">
<b>Details</b>
<table width="1%" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td>Site no.</td>
<td>G0010005</td>
</tr>
<tr>
<td>Site commence</td>
<td>09/08/1965</td>
</tr>
<tr>
<td>Zero gauge</td>
<td>0</td>
</tr>
<tr>
<td>Datum</td>
<td>GD</td>
</tr>
<tr>
<tr>
<tr>
<tr>
<tr>
<tr>
<tr>
</tbody>
</table>
</td>
<td valign="top" align="left">
</tr>
<tr>
</tbody>
</table>
</div>
<style type="text/css">
</body>
<script>
</body>
</html>
我的问题是如何提取HTML表格元素G0010005,09 / 08 / 1965,0具有属性名称&#39;网站编号&#39;,&#39;网站开始&#39;,&#39; ;零距离&#39;分别在python中使用selenium包。我尝试使用少量参数提取,但它们都没有为我工作。下面是我到目前为止编写的代码......
>>> from selenium import webdriver
>>> driver = webdriver.Firefox()
>>> driver.get("https://water.nt.gov.au/cgi/webhyd.pl?dwhswdf_org=G0010005&cat=dwhsw&lvl=1&")
>>> tbl = driver.find_element_by_xpath("//html/body/body/div[3]/table/tbody/tr[1]/td[1]/table/tbody/tr[1]/td[2]")
>>> tb1.get_attribute()
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
get_attribute(tb1)
NameError: name 'get_attribute' is not defined
>>> tbl = driver.find_element_by_name("Site no.")
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
tbl = driver.find_element_by_name("Site no.")
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 365, in find_element_by_name
return self.find_element(by=By.NAME, value=name)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 752, in find_element
'value': value})['value']
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 236, in execute
self.error_handler.check_response(response)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 192, in check_response
raise exception_class(message, screen, stacktrace)
NoSuchElementException: Message: Unable to locate element: [name="Site no."]
>>> tbl = driver.find_element_by_text('Site no.')
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
tbl = driver.find_element_by_text('Site no.')
AttributeError: 'WebDriver' object has no attribute 'find_element_by_text'
对此有任何帮助表示赞赏。
答案 0 :(得分:0)
尝试以下代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Firefox()
browser.get('http://water.nt.gov.au/cgi/webhyd.pl?dwhswdf_org=G0010005&cat=dwhsw&lvl=1&')
siteNoEle = browser.find_element_by_xpath("//td[text()='Site no.']/following-sibling::td[1]")
siteNo = siteNoEle.text
print siteNo
siteCommenceEle = browser.find_element_by_xpath("//td[text()='Site commence']/following-sibling::td[1]")
siteCommence = siteCommenceEle.text
print siteCommence
zeroEle = browser.find_element_by_xpath("//td[text()='Zero gauge']/following-sibling::td[1]")
zero = zeroEle.text
print zero
browser.quit()
建议:
tbl = driver.find_element_by_name("Site no.")
:使用HTML中指定的name="Site no."
时的方法。在给定的HTML中,Site no.
不是``name`属性的值。所以,你不能使用它。tbl = driver.find_element_by_text('Site no.')
:WebDriver中没有定义此类方法。原始方法是find_element_by_link_text
。此方法用于查找具有a
标记链接的元素,但不用于HTML中的所有元素。你无法使用该元素的文本获取元素(链接除外,即a
标签文本)。