如何从特定类中获取特定链接?

时间:2017-01-07 15:42:10

标签: vba excel-vba internet-explorer web-scraping excel

我想从特定

中提取 href
<tr class="even">
    <td>
        <a href="/italy/serie-a-2015-2016/">Serie A 2015/2016</a>
    </td>

这就是我写的:

Sub ExtractHrefClass()

    Dim ie As Object
    Dim doc As HTMLDocument
    Dim class As Object
    Dim href As Object

    Set ie = CreateObject("InternetExplorer.Application")
    ie.Visible = True
    ie.navigate Range("D8")
    Do
        DoEvents
    Loop Until ie.readyState = READYSTATE_COMPLETE
    Set doc = ie.document
    Set class = doc.getElementsByClassName("even")
    Set href = class.getElementsByTagName("a")
    Range("E8").Value = href
    ie.Quit

End Sub

但遗憾的是对象不支持此属性或方法(错误438)

    Set href = class.getElementsByTagName("a")

更新1

我根据@RyszardJędraszyk回答修改了代码,但没有输出O_o我在哪里做错了?

Sub ExtractHrefClass()

    Dim ie As Object
    Dim doc As HTMLDocument
    Dim href As Object
    Dim htmlEle As Object

    Set ie = CreateObject("InternetExplorer.Application")
    ie.Visible = True
    ie.navigate Range("D8")
    Do
        DoEvents
    Loop Until ie.readyState = READYSTATE_COMPLETE And ie.Busy = False
    Set doc = ie.document
    Set href = doc.getElementsByTagName("a")
    For Each htmlEle In href
        If htmlEle.className = "even" Then
            Range("E8").Value = htmlEle
        End If
    Next
    ie.Quit

End Sub

更新2

正如@dee在评论中所要求的那样,网页上有代码polygon inside polygon

<tbody>
    <tr>
        <td>
            <a href="/italy/serie-a/">Serie A 2016/2017</a>
        </td>
        <td></td>
    </tr>
    <tr class="even">
        <td>
            <a href="/italy/serie-a-2015-2016/">Serie A 2015/2016</a>
        </td>
        <td>
            <span class="team-logo" style="background-image: url(/res/image/data/UZbZIMhM-bsGsveSt.png)"></span><a href="/team/juventus/C06aJvIB/">Juventus</a>
        </td>
    </tr>
    <tr>
        <td>
            <a href="/italy/serie-a-2014-2015/">Serie A 2014/2015</a>
        </td>
        <td>
            <span class="team-logo" style="background-image: url(/res/image/data/UZbZIMhM-bsGsveSt.png)"></span><a href="/team/juventus/C06aJvIB/">Juventus</a>
        </td>
    </tr>

我只需要提取该行:/italy/serie-a-2015-2016/

3 个答案:

答案 0 :(得分:1)

这对我有用:

With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", "http://www.soccer24.com/italy/serie-a/archive/", False
    .Send
    MsgBox Split(Split(Split(.ResponseText, "<tr class=""even"">", 2)(1), "<a href=""", 2)(1), """", 2)(0)
End With

您需要的程序可能如下:

Sub ExtractHrefClass()

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", Range("D8").Value, False
        .Send
        Range("E8").Value = Split(Split(Split(.ResponseText, "<tr class=""even"">", 2)(1), "<a href=""", 2)(1), """", 2)(0)
    End With

End Sub

答案 1 :(得分:0)

尝试:

Dim href As HTMLObjectElement

确保在引用中检查了正确的库(Microsoft HTML Object Library)。

您确定doc.getElementsByClassName("even")有效吗?此处未列出:https://msdn.microsoft.com/en-us/library/aa926433.aspx作为可用方法。

我总是首先使用getElementsByTagName并进行控制If htmlEle.className = "even" then

同时添加以下内容:ie.readyState = READYSTATE_COMPLETE and ie.busy = False。如果它是一些基于AJAX的网站,它仍然不足以确定网站已满载(从链接猜测它可能是flashscore.com,你需要在网站上跟踪元素通知其加载状态)。

答案 2 :(得分:0)

这里可以使用

querySelectorAllquerySelector来选择具有特定anchor的{​​{1}}内的tr元素,然后选择class可以检索getAttribute("href")。 HTH。

href-attribute