我已经在VBA中编写了一个脚本,以分析来自一些站点的一些链接(连接到contact
关键字)。每个站点的一个链接。我使用xmlhttp
请求来完成任务。当我执行脚本时,它会解析每个站点的链接。唯一的问题是很少有站点没有任何此类链接(连接到contact
关键字),结果我的excel工作表中的输出变得混乱。更清楚一点:如果任何站点都没有这样的链接,我的刮板将使用先前的值填充该列。我将这些收集的链接存储在每个搜索的下一列中。我希望下面的图片能使您理解我的意思。
这是我到目前为止的尝试:
Sub GetConditionalLinks()
Dim HTTP As New XMLHTTP60, Html As New HTMLDocument
Dim post As Object, cel As Range, newlink$, R&
For Each cel In Range("A1:A" & Cells(Rows.Count, 1).End(xlUp).Row)
If cel.Value <> "" Then
With HTTP
.Open "GET", cel.Value, False
.send
Html.body.innerHTML = .responseText
End With
For Each post In Html.getElementsByTagName("a")
If InStr(1, post.innerText, "contact", 1) > 0 Then newlink = post.getAttribute("href"): Exit For
Next post
cel(1, 2) = newlink
End If
Next cel
End Sub
我尝试过的链接(我特意将几行留空以查看脚本的行为):
https://www.yify-torrent.org/search/1080p/
https://www.houzz.com/professionals/
https://chandoo.org/forum/forums/vba-macros/
https://www.amazon.com/dp/B01LTIORC8
https://stackoverflow.com/questions
https://www.amazon.com/dp/B01LTIORC8
https://www.amazon.com/dp/B00GPAFHIO
我得到的输出:
搜索链接位于A列中,收集的链接位于B列中。您必须已经注意到由于我的循环逻辑,收集的链接与源链接不匹配。
我的问题:
如何修复循环以获得预期的输出?
如果我模仿cel(1, 2)
之类的东西,那Worksheets("SomeSheet").Range("A1")
的完全限定行是什么?
答案 0 :(得分:2)
我希望您能够使用类似以下的内容:
Option Explicit
Public Sub GetConditionalLinks()
Dim HTTP As New XMLHTTP60, Html As New HTMLDocument, post As Object, i As Long, arr()
With ActiveSheet
arr = .Range("A1:B" & .Cells(.Rows.Count, 1).End(xlUp).Row).Value
With HTTP
For i = LBound(arr, 1) To UBound(arr, 1)
If arr(i, 1) <> vbNullString Then
.Open "GET", arr(i, 1), False
.send
Html.body.innerHTML = .responseText
For Each post In Html.getElementsByTagName("a")
If InStr(1, post.innerText, "contact", 1) > 0 Then arr(i, 2) = post.getAttribute("href"): Exit For
Next post
End If
Next i
End With
.Range("A1").Resize(UBound(arr, 1), UBound(arr, 2)) = arr
End With
End Sub
访问被拒绝的站点:
因此,我开始获得拒绝访问的权限,因此请按以下步骤重新编写。对改善错误处理的建议持开放态度。这非常简陋,但我试图避免使用GoTo
语句。
Option Explicit
Public Sub GetConditionalLinks()
Dim HTTP As New MSXML2.ServerXMLHTTP60, Html As New HTMLDocument, post As Object, i As Long, arr(), timeoutError As Boolean
With ActiveSheet
arr = .Range("A1:B" & .Cells(.Rows.Count, 1).End(xlUp).Row).Value
With HTTP
For i = LBound(arr, 1) To UBound(arr, 1)
timeoutError = False
If arr(i, 1) <> vbNullString Then
.Open "GET", arr(i, 1), False
On Error GoTo Errhand
.send
If Not timeoutError Then
Html.body.innerHTML = .responseText
For Each post In Html.getElementsByTagName("a")
If InStr(1, post.innerText, "contact", 1) > 0 Then arr(i, 2) = post.getAttribute("href"): Exit For
Next post
End If
End If
Next i
End With
.Range("A1").Resize(UBound(arr, 1), UBound(arr, 2)) = arr
End With
Exit Sub
Errhand:
If Err.Number <> 0 Then
Select Case Err.Number
Case -2147012894 '<== Timeout; especially on access denied sites
timeoutError = True
Resume Next
Case Else '<== Don't know what we are gonna do yet so let's exit
Debug.Print Err.Number, Err.Description
End Select
End If
End Sub
不使用数组和循环表:
Option Explicit
Public Sub GetConditionalLinks()
Dim HTTP As New MSXML2.ServerXMLHTTP60, Html As New HTMLDocument, cel As Range, post As Object, R As Long, timeoutError As Boolean
Application.ScreenUpdating = False
With ActiveSheet
For Each cel In .Range("A1:A" & .Cells(Rows.Count, 1).End(xlUp).Row)
R = R + 1: timeoutError = False
If Not IsEmpty(cel) Then
HTTP.Open "GET", cel.Value, False
On Error GoTo Errhand
HTTP.send
If Not timeoutError Then
Html.body.innerHTML = HTTP.responseText
For Each post In Html.getElementsByTagName("a")
If InStr(1, post.innerText, "contact", 1) > 0 Then
.Cells(R, 2) = post.getAttribute("href"): Exit For
End If
Next post
End If
End If
Next cel
End With
Application.ScreenUpdating = True
Exit Sub
Errhand:
If Err.Number <> 0 Then
Select Case Err.Number
Case -2147012894 '<== Timeout; especially on access denied sites
timeoutError = True
Resume Next
Case Else
Debug.Print Err.Number, Err.Description
End Select
End If
Application.ScreenUpdating = True
End Sub
答案 1 :(得分:0)
如何执行以下操作?仅在脚本中的newlink = ""
之后定义for loop
才能解决此问题:
Sub GetConditionalLinks()
Dim HTTP As New XMLHTTP60, HTML As New HTMLDocument
Dim post As Object, elem As Object, newlink$
Dim cel As Range, R&
For Each cel In Range("A1:A" & Cells(Rows.Count, 1).End(xlUp).Row)
newlink = "" '''THIS IS THE FIX
If cel.Value <> "" Then
With HTTP
.Open "GET", cel.Value, False
.send
HTML.body.innerHTML = .responseText
End With
For Each post In HTML.getElementsByTagName("a")
If InStr(1, post.innerText, "contact", 1) > 0 Then newlink = post.getAttribute("href"): Exit For
Next post
cel(1, 2) = newlink
End If
Next cel
End Sub