无法在ServerXMLHTTP请求中设置自定义超时

时间:2019-03-22 17:11:06

标签: vba web-scraping proxy timeout serverxmlhttp

我已经在vba中编写了一个脚本,以在提出代理请求后从网站上抓取第一篇文章。我在vba脚本中发出HTTP请求时使用了代理(不在代理列表中),以检查总帖子的长度。成功发送请求后,脚本应解析第一条帖子和正在使用的代理,然后退出循环。

  

有时候脚本以正确的方式工作,但是大多数情况下,即使我在发送请求之前定义了timeout,脚本也会花一些时间才能完成操作。在这一点上,我对于是否可以正确地填写timeout参数非常怀疑。我期望的是脚本将一直等到那个时间,否则它将引发timeout错误并继续下一个请求。

到目前为止,我已经写过:

Sub HandleTimeOut()
    Dim Http As New ServerXMLHTTP60, Html As New HTMLDocument
    Dim elem As Object, proxyList As Variant, oProxy As Variant

    proxyList = [{"50.246.120.125:8080","198.204.253.115:3128","98.172.142.99:8080","207.188.231.141:8080"}]

    For Each oProxy In proxyList
        With Http
            .Open "GET", "https://stackoverflow.com/questions/tagged/web-scraping", True
            .setRequestHeader "User-Agent", "Mozilla/5.0"
            .setProxy 2, oProxy
            .setTimeouts 600000, 600000, 15000, 15000
            On Error Resume Next
            .send
            While .readyState < 4: DoEvents: Wend
            Html.body.innerHTML = .responseText
            Set elem = Html.querySelectorAll(".summary .question-hyperlink")
            On Error GoTo 0
        End With

        If elem.Length > 0 Then
            [A1] = oProxy
            [B1] = elem(0).innerText
            Exit For
        End If
    Next oProxy
End Sub

timeout设置五秒钟的正确方法是什么?

1 个答案:

答案 0 :(得分:1)

.Open "GET", "https://stackoverflow.com/questions/tagged/web-scraping", True

应该是

.Open "GET", "https://stackoverflow.com/questions/tagged/web-scraping", False

how to set http timeout using asp?