Excel VBA - 使用XMLHTTP(MSXML2.XMLHTTP)抓取Google

时间:2017-01-09 13:31:45

标签: html excel vba dom serverxmlhttp

我正在尝试为Google新闻上的某些新闻主题汇总每日事件表。

在单个模块中,我有以下内容:

Option Explicit
Private Declare Function URLDownloadToFile Lib "urlmon" _
Alias "URLDownloadToFileA" (ByVal pCaller As Long, _
ByVal szURL As String, ByVal szFileName As String, _
ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

Dim ret As Long

Sub Go()

Dim url As String, i As Integer, numb_H3 As Integer, lastRow As Long, XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object, j1 As Object

url = "https://www.google.co.uk/search?q=" & "Wearables" & "&tbm=nws" ' "&rnd=" & WorksheetFunction.RandBetween(1, 10000)

Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getElementById("rso")
numb_H3 = objResultDiv.GetElementsByTagname("H3").Length
For i = 0 To numb_H3 - 1
If numb_H3 > 0 Then
    Set objH3 = objResultDiv.GetElementsByTagname("H3")(i)
    Set link = objH3.GetElementsByTagname("a")(0)

'get thumbnail image location
    Cells(ActiveCell.Row + i, 1).Value = objResultDiv.GetElementsByTagname("img")(i).src
'get news title    
    Cells(ActiveCell.Row + i, 2).Value = objH3.InnerText
'get news link
    Cells(ActiveCell.Row + i, 3).Value = link.href
'get source name
    Cells(ActiveCell.Row + i, 5).Value = "need help"
'get source time
    Cells(ActiveCell.Row + i, 6).Value = "need help"
'get news paragraph
    Cells(ActiveCell.Row + i, 7).Value = "need help"

End If
DoEvents
Next i

html.Close

End Sub

我可以返回以下对象:

enter image description here

我知道我想要获得的对象是红色的,我在使用GetElementsByClassName时只是在努力学习语法:enter image description here

所以例如,我知道文本" ZDNet"在于:

?...GetElementsByClassName("slp")(i).GetElementsByTagname("span")(0).InnerText

日期" 2017年1月7日"在于:

?...GetElementsByClassName("slp")(i).GetElementsByTagname("span")(2).InnerText

但我无法获得正确的语法。

我希望我犯了一个非常简单的错误,但如果效率更高,我也会接受其他方法。

感谢阅读, J先生

0 个答案:

没有答案