网络搜索与VBA晨星金融

时间:2016-05-12 14:47:09

标签: regex vba excel-vba web-scraping xmlhttprequest

我正试图在这个网址上从Morningstar获取内部所有权: http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR&region=usa&culture=en-US

这是我正在使用的代码:

Sub test()

    Dim appIE As Object

    Set appIE = CreateObject("InternetExplorer.Application")
    With appIE
        .Navigate "http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR&region=usa&culture=en-US"
        .Visible = True
    End With
    While appIE.Busy
        DoEvents
    Wend
    Set allRowOfData = appIE.Document.getElementById("currentInsiderVal")
    Debug.Print allRowOfData
    Dim myValue As String: myValue = allRowOfData.Cells(0).innerHTML
    appIE.Quit
    Set appIE = Nothing

    Range("A30").Value = myValue

End Sub

我在第

行得到了运行时错误13
Set allRowOfData = appIE.Document.getElementById("currentInsiderVal")

但我看不出任何不匹配。发生了什么事?

2 个答案:

答案 0 :(得分:1)

您可以使用XHR和RegEx而不是繁琐的IE来实现:

Sub Test()
    Dim sContent
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR&region=usa&culture=en-US", False
        .Send
        sContent = .ResponseText
    End With
    With CreateObject("VBScript.RegExp")
        .Pattern = ",""currInsiderVal"":(.*?),"
        Range("A30").Value = .Execute(sContent).Item(0).SubMatches(0)
    End With
End Sub

以下是代码的工作原理:

首先创建MSXML2.XMLHTTP ActiveX实例。在同步模式下使用目标URL打开GET请求(执行中断直到收到响应)。

然后创建VBScript.RegExp。默认情况下,.IgnoreCase.Global.MultiLine属性为False。模式为,"currInsiderVal":(.*?),,其中(.*?)为捕获组,.表示任何字符.* - 零个或多个字符.*? - 尽可能少字符(懒惰匹配)。模式中的其他字符可以找到。 .Execute方法返回一组匹配项,其中只有一个匹配对象,因为.GlobalFalse。此匹配对象具有一组子匹配,其中只有一个子匹配,因为该模式包含唯一的捕获组。
有关正则表达式的一些有用的MSDN文章:
Microsoft Beefs Up VBScript with Regular Expressions
Introduction to Regular Expressions

以下是我创建代码的说明:

首先,我使用浏览器在网页DOM上找到了一个包含目标值的元素:

target value

相应的节点是:

<td align="right" id="currrentInsiderVal">143.51</td>

然后我创建了XHR并在响应HTML中找到了此节点,但它没有包含该值(您可以在刷新页面后在网络选项卡上的浏览器开发人员工具中找到响应):

<td align="right" id="currrentInsiderVal">
</td>

此类行为是DHTML的典型行为。加载网页后,脚本生成动态HTML内容,或者通过XHR从Web检索数据,或者只是处理已加载的网页数据。然后我只是在响应中搜索了值143.51,JS函数中的代码段,"currInsiderVal":143.51,

            fundsArr = {"fundTotalHistVal":132.61,"mutualFunds":[[1,89,"#a71620"],[2,145,"#a71620"],[3,152,"#a71620"],[4,198,"#a71620"],[5,155,"#a71620"],[6,146,"#a71620"],[7,146,"#a71620"],[8,132,"#a71620"]],"insiderHisMaxVal":3.535,"institutions":[[1,273,"#283862"],[2,318,"#283862"],[3,351,"#283862"],[4,369,"#283862"],[5,311,"#283862"],[6,298,"#283862"],[7,274,"#283862"],[8,263,"#283862"]],"currFundData":[2,2202,"#a6001d"],"currInstData":[1,4370,"#283864"],"instHistMaxVal":369,"insiders":[[5,0.042,"#ff6c21"],[6,0.057,"#ff6c21"],[7,0.057,"#ff6c21"],[8,3.535,"#ff6c21"],[5,0],[6,0],[7,0],[8,0]],"currMax":4370,"histLineQuars":[[1,"Q2"],[2,"Q3"],[3,"Q4"],[4,"Q1<br>2015"],[5,"Q2"],[6,"Q3"],[7,"Q4"],[8,"Q1<br>2016"]],"fundHisMaxVal":198,"currInsiderData":[3,143,"#ff6900"],"currFundVal":2202.85,"quarters":[[1,"Q2"],[2,""],[3,""],[4,"Q1<br>2015"],[5,""],[6,""],[7,""],[8,"Q1<br>2016"]],"insiderTotalHistVal":3.54,"currInstVal":4370.46,"currInsiderVal":143.51,"use10YearData":"false","instTotalHistVal":263.74,"maxValue":369};

因此,基于它创建的正则表达式模式应该找到,"currInsiderVal":<some text>,,其中<some text>是我们的目标值。

答案 1 :(得分:0)

看一下网站,你想要检索的元素中有一个拼写错误;而不是currentInsiderVal尝试使用currrentInsiderVal,您应该正确检索数据。

可能值得考虑一些错误捕获,以便为您检索的任何其他字段捕获此类内容?

在您发表评论后,我仔细看了一下。你的问题似乎是试图捕获单个单元格的id,而不是沿着对象树导航。我已修改代码以检索您所在表的行,然后将myValue设置为该行中的正确单元格。当我试用它时似乎工作。试一试?

Sub test()

Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")


With appIE
    .Navigate "http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR&region=usa&culture=en-US"
    .Visible = True
End With

While appIE.Busy
    DoEvents
Wend

Set allRowOfData = appIE.Document.getelementbyID("tableTest").getElementsByTagName("tbody")(0).getElementsByTagName("tr")(5)
myValue = allRowOfData.Cells(2).innerHTML

appIE.Quit
Set appIE = Nothing
Range("A30").Value = myValue
End Sub