如何在标签内获取内部文本?

时间:2019-06-11 13:48:55

标签: excel vba web-scraping xmlhttprequest

我正在编写VBA,以使用XML从股票网站中提取表格,但无法提取标记内的内部文本。我可以知道我的代码有什么问题吗?

我试图通过使用xxxx.innerText获取内部文本,其中xxxx是MSHTML.IHTMLElement。

Sub ProcessHTMLPage(HTMLPage As MSHTML.HTMLDocument)

    Dim htmlTable As MSHTML.IHTMLElement
    Dim htmlTables As MSHTML.IHTMLElementCollection
    Dim htmlrow As MSHTML.IHTMLElement
    Dim htmlcell As MSHTML.IHTMLElement
    Dim RowNum As Integer
    Dim ColNum As Integer

    Set htmlTables = HTMLPage.getElementsByTagName("table")
    For Each htmlTable In htmlTables

    If htmlTable.className = "table_list" Then

        RowNum = 2
        For Each htmlrow In htmlTable.getElementsByTagName("tr")

            ColNum = 1
            For Each htmlcell In htmlrow.Children
                Debug.Print htmlcell.innerText
                ColNum = ColNum + 1
            Next htmlcell

        RowNum = RowNum + 1
        Next htmlrow

    End If
    Next htmlTable

End Sub

Sub GetPrice()

    Dim XMLPage As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
    Dim URL As String

    URL = "https://www.hkex.com.hk/Market-Data/Securities-Prices/Equities/Equities-Quote?sym=2&sc_lang=zh-hk"
    XMLPage.Open "GET", URL, False
    XMLPage.send
    HTMLDoc.body.innerHTML = XMLPage.responseText

    ProcessHTMLPage HTMLDoc

End Sub

我希望输出应为:

即日 52周 最高价
HK $ 90.700
HK $ 97.400 ...

2 个答案:

答案 0 :(得分:1)

页面动态加载该内容,这就是为什么对初始URL的xhr请求没有给您预期的结果的原因。

但是,您可以在浏览器的网络选项卡中查找,并找到页面用于通过单独的xhr更新内容的端点。

在下面,我不确定令牌是否基于时间,但是您可以进行探索。您需要剥离外部jquery字符串,但是您可以使用json解析器解析内部json。我使用jsonconverter.bas。您将jsonconverter.bas代码下载到名为JsonConverter的标准模块中,然后转到VBE>工具>引用>添加对Microsoft脚本运行时的引用。如果您不想使用json解析器(解析器应该是您的首选),则可以使用split函数来提取您想要的信息。

我将在工作表中设置表格格式,并根据需要设置列标题和行标题(可能如网页上所示),并将单元格设置为HKD货币-然后将以下变量中的值分配给其中的单元格那张桌子。您可以绑定一个按钮,以便随意刷新工作表中的表格。


布局(和结果):

enter image description here


代码:

Option Explicit
Public Sub GetInfo()
    Dim s As String, json As Object
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www1.hkex.com.hk/hkexwidget/data/getequityquote?sym=2&token=evLtsLsBNAUVTPxtGqVeG6jZbQlrF5FojHmveNua5GgTcjPuBcUs7GTV0hIBAgAI&lang=chi&qid=1560281438643&callback=jQuery311003616462678192556_1560281436567&_=1560281436568", False
        .send
        s = Split(Split(.responseText, "(")(1), ")")(0)
    End With
    Set json = JsonConverter.ParseJson(s)("data")("quote")

    Dim sameDayHigh As Double, fiftyTwoWeekHigh As Double, sameDayLow As Double, fiftyTwoWeekLow As Double, timeInfo As String

    sameDayHigh = json("hi")
    sameDayLow = json("lo")
    fiftyTwoWeekHigh = json("hi52")
    fiftyTwoWeekLow = json("lo52")
    timeInfo = json("updatetime")

    With ThisWorkbook.Worksheets("Sheet1")
        .Cells(1, 1) = timeInfo
        .Cells(3, 2).Resize(1, 2) = Array(sameDayHigh, fiftyTwoWeekHigh)
        .Cells(4, 2).Resize(1, 2) = Array(sameDayLow, fiftyTwoWeekLow)
    End With
End Sub

以上的压缩版本:

Option Explicit
Public Sub GetInfo()
    Dim s As String, json As Object
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www1.hkex.com.hk/hkexwidget/data/getequityquote?sym=2&token=evLtsLsBNAUVTPxtGqVeG6jZbQlrF5FojHmveNua5GgTcjPuBcUs7GTV0hIBAgAI&lang=chi&qid=1560281438643&callback=jQuery311003616462678192556_1560281436567&_=1560281436568", False
        .send
        s = Split(Split(.responseText, "(")(1), ")")(0)
    End With
    Set json = JsonConverter.ParseJson(s)("data")("quote")
    With ThisWorkbook.Worksheets("Sheet1")
        .Cells(1, 1) = json("updatetime")
        .Cells(3, 2).Resize(1, 2) = Array(json("hi"), json("hi52"))
        .Cells(4, 2).Resize(1, 2) = Array(json("lo"), json("lo52"))
    End With
End Sub

答案 1 :(得分:0)

这与您的方法稍有不同,但是可以得到预期的结果,而不会循环浏览各行:

enter image description here

不过,以后可以将打印出的字符串写到列表中,因此,如果用新行将其隔开并删除空字符串,则可以使用它:

Sub TestMe()

    Dim appIE As Object
    Set appIE = CreateObject("InternetExplorer.Application")

    With appIE
        .navigate "https://www.hkex.com.hk/Market-Data/Securities-Prices/Equities/Equities-Quote?sym=2&sc_lang=zh-hk"
        .Visible = False
    End With

    Do While appIE.Busy
        DoEvents
    Loop

    Dim allData As Object
    Set allData = appIE.document.getElementsByClassName("table_list")

    Debug.Print allData.item.outerText
    appIE.Close

End Sub