从多个网址中截取HTML表格

时间:2015-02-28 14:07:39

标签: excel vba excel-vba screen-scraping

我试图从一系列网址中抓取桌面内容。我一直在研究下面的代码,它执行以下步骤:

  1. 根据范围A1中的值添加新工作表:工作表中的A3"开始" (1,2,3等)
  2. 根据相同的值(,2,3等)创建网址
  3. 激活新工作表
  4. 打开网址和抓表
  5. 发生以下情况:

    1. 添加新工作表(1,2,3)
    2. 工作表(" 1")包含
    3. 中的表格
    4. 工作表(" 2")及以下内容仍为空
    5. 已添加网址 - http://cao.szw.nl/index.cfm?fuseaction=app.caoOverzicht&menu_item_id=16534&hoofdmenu_item_id=16507&rubriek_item=392846&rubriek_id=392840&strSorteerWijze=asc&strGesorteerdeKolom=cao_naam&pagenumber=1

      注意 - 该网站完全使用荷兰语;)

      错误在哪里?

      Sub TableExample()
      
          Dim IE As Object, doc As Object
          Dim strURL As String
          Dim ws As Worksheet, wsActive As Worksheet
          Dim i As Long, tabno As Long, nextrow As Long
          Dim cell As Range
          Dim MyNames As Range, MyNewSheet As Range
          Dim tbl As Object, rw As Object, cl As Object
      
          Set ws = Sheets("Start")
      
          With ws
      
             Dim rng As Range
             Set rng = .Range("A1:A3")
      
             For Each cell In rng
                  Sheets.Add.Name = cell.Value
                  Set wsActive = ThisWorkbook.ActiveSheet
                  strURL = "http://xxx&pagenumber=" & cell.Value
                  Set IE = CreateObject("InternetExplorer.Application")
                  With IE
                      '.Visible = True
                      .navigate strURL
                      Do Until .readyState = 4: DoEvents: Loop
                          Do While .Busy: DoEvents: Loop
                              Set doc = IE.document
                                  With wsActive
                                      For Each tbl In doc.getElementsByTagName("TABLE")
                                          tabno = tabno + 1
                                          nextrow = nextrow + 1
                                          Set rng = wsActive.Range("B" & nextrow)
                                          rng.Offset(, -1) = "Table " & tabno
                                          For Each rw In tbl.Rows
                                              For Each cl In rw.Cells
                                                  rng.Value = cl.outerText
                                                  Set rng = rng.Offset(, 1)
                                                  i = i + 1
                                              Next cl
                                              nextrow = nextrow + 1
                                              Set rng = rng.Offset(1, -i)
                                              i = 0
                                          Next rw
                                      Next tbl
                                  End With
                  End With
             Next
          End With
      
          IE.Quit
      
      End Sub
      

1 个答案:

答案 0 :(得分:1)

检查了你的代码并简单地完成了它,它对我有用。顺便说一句。 {-1}}表行单元格没有任何问题,它们是有效的对象。

for-each

enter image description here