刮刀点击下一页按钮但没有取任何内容

时间:2017-07-30 13:14:28

标签: vba selenium xpath selenium-webdriver web-scraping

我在vba中编写了一些与selenium结合使用的代码来解析分散在多个页面中的不同表中的数据。当我运行我的脚本时,我可以看到它从第一页解析数据然后继续点击下一页按钮,直到没有更多按钮可用。但是,我从第一页获取数据并看到浏览器单击下一页按钮,因为它无法从其他页面获取任何数据。我不明白我在这里做错了什么。也许,我创建的循环与它有关或我不知道。谢谢你看看它。这是完整的代码:

Sub Table_data()
    Dim driver As New ChromeDriver
    Dim tabl As Object, rdata As Object, cdata As Object

    driver.Get "https://toolkit.financialexpress.net/santanderam"
    driver.Wait 1000

    For Each tabl In driver.FindElementsByXPath("//table[@class='fe-datatable']")
        For Each rdata In tabl.FindElementsByXPath(".//tr")
            For Each cdata In rdata.FindElementsByXPath(".//td")
                y = y + 1
                Cells(x + 1, y) = cdata.Text
            Next cdata
            x = x + 1
            y = 0
        Next rdata
        driver.FindElementByLinkText("Next").Click
        driver.Wait 1000
    Next tabl
End Sub

2 个答案:

答案 0 :(得分:1)

我个人会改变你迭代页面的方式。在伪代码

中应该是这样的
function element getNextButton(){
    all_buttons = driver.findElementsByXpath("""//*[@id="Price_1_1"]/tfoot/tr/td/div/div/a""");
    next_button = all_buttons[all_buttons.Size()-1];
    return next_button;
}

main(){
    next_button = getNextButton();
    while true{
        do something with your current table;
        next_button.click();
        wait(2); // wait some time till the page loads
        next_button = getNextButton();
        if next_button.text does not contains 'Next'{
            break;
        }
    }
}

我刚刚在Python上测试过它:

from selenium import webdriver
import time

def get_next_button():
    buttons = driver.find_elements_by_xpath("""//*[@id="Price_1_1"]/tfoot/tr/td/div/div/a""")
    next_element_button = buttons[len(buttons)-1]
    return next_element_button

chrome_path = r"chromedriver.exe"

driver = webdriver.Chrome(chrome_path)

driver.get("https://toolkit.financialexpress.net/santanderam")
time.sleep(5)

next_button =get_next_button()

while(True):
    # Do something with the table
    next_button.click()
    time.sleep(2)
    next_button = get_next_button()
    if 'Next' not in next_button.text:
        break

print 'End'

我对vba不熟悉,但如果你不懂Python,我可以尝试将其翻译成vba。

修改

VBA解决方案的“近似”应该是这样(请检查语法错误,我从未使用过VBA):

Function GetNextElement() as Object
    Dim all_buttons As Object
    Dim next_button As Object
    all_buttons= driver.FindElementsByXpath("""//*[@id="Price_1_1"]/tfoot/tr/td/div/div/a""")
    next_button = all_buttons[all_buttons.Length-1]
    Return next_button 
End Function

Sub Table_data()
    Dim driver As New ChromeDriver
    Dim position as Integer
    Dim next_button As Object
    driver.Get "https://toolkit.financialexpress.net/santanderam"
    driver.Wait 1000
    next_button = GetNextElement()

    Do While True
        // Do something with the table
        next_button.Click
        driver.Wait 2000
        next_button = GetNextElement()
        position = InStr(next_button.Text,"Next")
        If position = 0 Then
            Exit Do
        End If      
    Loop
End Sub

答案 1 :(得分:1)

考虑按下循环外的Next按钮。你应该在另一个循环中使用它,并且当没有按下Next按钮时循环应该终止(运行时错误7:NoSuchElementError)

Xpath //table[@class='fe-fund-tableBody']也会返回页码。您应该按类名使用内部表//*[@id='docRows'],或者按id (//table[@class='fe-fund-tableBody'])[1]搜索。他们将指向相同的元素。

您可能已经注意到上述元素有7次出现。您的代码遍历每个页面的空代码。您可以通过循环显示第一次出现来避免这种情况,例如:(//*[@id='docRows'])[1]implicit/explicit wait

我还建议找到wait代替Sub Table_data() Dim driver As New ChromeDriver Dim tabl As Object, rdata As Object, cdata As Object driver.Get "https://toolkit.financialexpress.net/santanderam" driver.Wait 1000 Do For Each tabl In driver.FindElementsByXPath("(//*[@id='docRows'])[1]") 'or "(//table[@class='fe-fund-tableBody'])[1]" For Each rdata In tabl.FindElementsByXPath(".//tr") For Each cdata In rdata.FindElementsByXPath(".//td") y = y + 1 Cells(x + 1, y) = cdata.Text Next cdata x = x + 1 y = 0 Next rdata Next tabl On Error Resume Next driver.FindElementByLinkText("Next").Click driver.Wait 1000 Loop Until Err.Number = 7 End Sub 的方法。如果我们不进一步改进其他任何事情,最后您的代码应如下所示:

[object Object]