将Selenium HTML源代码存储到HTMLDocument类型的元素中

时间:2015-12-24 23:29:17

标签: html vba excel-vba selenium xhtml

是否可以将使用Selenium(使用Excel VBA)抓取的HTML源存储到HTMLDocument元素中? 这是使用Microsoft Internet ControlsMicrosoft HTML Object Library自动化Internet Explorer的示例。

Dim IE as InternetExplorer
Dim HTML as HTMLDocument
Set IE = New InternetExplorer
ie.navigate "www.google.com"
set HTML = IE.Document

与Selenium一样可以圆顶吗?例如(不工作!):<​​/ p>

Dim selenium As SeleniumWrapper.WebDriver
Set selenium = New SeleniumWrapper.WebDriver
Dim html as HTMLDocument

selenium.Start "firefox", "about:blank"
selenium.Open "file:///D:/webpages/LE_1001.htm"
Set html = selenium.getHtmlSource 'this is not working since .getHtmlSource() 
                                 'returns a String object but is there a way to store 
                                 'this html source into a type of HTMLDocument-element

3 个答案:

答案 0 :(得分:1)

这应该可以使用字符串作为HTML文档的源:

Set html = New HTMLDocument
html.body.innerHTML = selenium.pageSource

编辑:从getHtmlSource更改了Selenium对pageSource的调用。完整的工作代码如下。不确定我们是否使用相同版本的Selenium:

Option Explicit

Sub foo()

Dim sel As selenium.WebDriver
Set sel = New selenium.WebDriver
Dim html As HTMLDocument

sel.Start "firefox", "about:blank"
sel.Get "http://www.google.com/"

Set html = New HTMLDocument
html.body.innerHTML = sel.PageSource

Debug.Print html.body.innerText

End Sub

引用Microsoft HTML Object Library和Selenium Type Library(Selenium32.tlb) - 使用SeleniumBasic版本2.0.6.0

答案 1 :(得分:1)

使用SeleniumBasic获取DOM的正确方法:

Sub Get_DOM()
  Dim driver As New FirefoxDriver
  driver.Get "https://en.wikipedia.org/wiki/Main_Page"

  Dim html As New HTMLDocument  ' Requires Microsoft HTML Library
  html.body.innerHTML = driver.ExecuteScript("return document.body.innerHTML;")

  Debug.Print html.body.innerText

  driver.Quit
End Sub

要使用上面的示例获取最新版本的日期: https://github.com/florentbr/SeleniumBasic/releases/latest

答案 2 :(得分:0)

不太确定为什么要将Selenium元素转换为HTMLDocument。它需要一个更有限的依赖项目。

我个人更喜欢将DOM-element分配给WebElement。例如:

If (Selenium.FindElementsByClass("qty").Count > 0) Then
    Dim qtyElement as WebElement: Set qtyElement = Selenium.FindElementByClass("qty")
End If

If (Not qtyElement is Nothing) then
    Dim qtyHtml as String: qtyHtml = qrtElement.Attribute("innerHTML")
End if

Debug.Print qtyHtml