Selenium:从HTML表中提取属性值

时间:2017-01-18 06:04:03

标签: python-2.7 selenium-webdriver html-parsing

link that contain the HTML table

这是XML文本

<!DOCTYPE html>
    <html>
    <head>
    <body onkeyup="return key_up(event,'dwhswdf_org')" onload="onLoad()" style="padding: 0px;">
    <script type="text/javascript">
    <head>
    <body>
    <div id="dark"></div>
    <div id="light"></div>
    <div id="wrapper">
    <div id="cattext"></div>
    <div id="titletext"></div>
    <div id="tabstext"></div>
    <br>
    <table width="1%" cellspacing="0" cellpadding="0" border="0">
    <tbody>
    <tr>
    <td width="1%" valign="top">
    <b>Details</b>
    <table width="1%" cellspacing="0" cellpadding="0" border="0">
    <tbody>
    <tr>
    <td>Site no.</td>
    <td>G0010005</td>
    </tr>
    <tr>
    <td>Site commence</td>
    <td>09/08/1965</td>
    </tr>
    <tr>
    <td>Zero gauge</td>
    <td>0</td>
    </tr>
    <tr>
    <td>Datum</td>
    <td>GD</td>
    </tr>
    <tr>
    <tr>
    <tr>
    <tr>
    <tr>
    <tr>
    <tr>
    </tbody>
    </table>
    </td>
    <td valign="top" align="left">
    </tr>
    <tr>
    </tbody>
    </table>
    </div>
    <style type="text/css">
    </body>
    <script>
    </body>
    </html>

我的问题是如何提取HTML表格元素G0010005,09 / 08 / 1965,0具有属性名称&#39;网站编号&#39;,&#39;网站开始&#39;,&#39; ;零距离&#39;分别在python中使用selenium包。我尝试使用少量参数提取,但它们都没有为我工作。下面是我到目前为止编写的代码......

>>> from selenium import webdriver
>>> driver  = webdriver.Firefox()
>>> driver.get("https://water.nt.gov.au/cgi/webhyd.pl?dwhswdf_org=G0010005&cat=dwhsw&lvl=1&")
>>> tbl = driver.find_element_by_xpath("//html/body/body/div[3]/table/tbody/tr[1]/td[1]/table/tbody/tr[1]/td[2]")
>>> tb1.get_attribute()

Traceback (most recent call last):
  File "<pyshell#7>", line 1, in <module>
    get_attribute(tb1)
NameError: name 'get_attribute' is not defined

>>> tbl = driver.find_element_by_name("Site no.")

Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    tbl = driver.find_element_by_name("Site no.")
  File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 365, in find_element_by_name
    return self.find_element(by=By.NAME, value=name)
  File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 752, in find_element
    'value': value})['value']
  File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 236, in execute
    self.error_handler.check_response(response)
  File "C:\Python27\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 192, in check_response
    raise exception_class(message, screen, stacktrace)
NoSuchElementException: Message: Unable to locate element: [name="Site no."]

>>> tbl = driver.find_element_by_text('Site no.')

Traceback (most recent call last):
  File "<pyshell#13>", line 1, in <module>
    tbl = driver.find_element_by_text('Site no.')
AttributeError: 'WebDriver' object has no attribute 'find_element_by_text'

对此有任何帮助表示赞赏。

1 个答案:

答案 0 :(得分:0)

尝试以下代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


browser = webdriver.Firefox()
browser.get('http://water.nt.gov.au/cgi/webhyd.pl?dwhswdf_org=G0010005&cat=dwhsw&lvl=1&')
siteNoEle = browser.find_element_by_xpath("//td[text()='Site no.']/following-sibling::td[1]")
siteNo = siteNoEle.text
print siteNo

siteCommenceEle = browser.find_element_by_xpath("//td[text()='Site commence']/following-sibling::td[1]")
siteCommence = siteCommenceEle.text
print siteCommence


zeroEle = browser.find_element_by_xpath("//td[text()='Zero gauge']/following-sibling::td[1]")
zero = zeroEle.text
print zero

browser.quit()

建议:

  1. tbl = driver.find_element_by_name("Site no."):使用HTML中指定的name="Site no."时的方法。在给定的HTML中,Site no.不是``name`属性的值。所以,你不能使用它。
  2. tbl = driver.find_element_by_text('Site no.'):WebDriver中没有定义此类方法。原始方法是find_element_by_link_text。此方法用于查找具有a标记链接的元素,但不用于HTML中的所有元素。你无法使用该元素的文本获取元素(链接除外,即a标签文本)。