合并来自多个列的数据

时间:2014-01-14 18:03:34

标签: excel vba excel-vba

我的问题类似但比这个帖子How to Consolidate Data from Multiple Excel Columns All into One Column更复杂。

以下是excel示例

Date       Measure1  A    B     Date       Measure2    A    B   C   Date.....
11/11/11   1234     1     2     11/12/12   5678        1    3   3   12/12/12  ...
12/11/12   234     34    234    12/12/13   345        342   23  33  12/12/13  ...
........

excel中有数百列。一个日期列后跟一个测量列,然后是其他一些列。 现在我只想要日期列,度量名称列和值列。 结果excel文件应该是

Date      Measure Name      Value
11/11/11  Measure1          1234
11/12/12  Measure2          5678
12/12/12  ....
....
12/11/12  Measure1          234
12/12/13  Measure2          123

我怎么能通过VBA实现它?由于我有这样的数千个文件,VBA似乎是整合这些文件并加载到数据库中的最佳方式。

我总是得到

  Run-time error '1004'
  Application -defined or object -defined eror"

at

  w.Sheets("DataSort").Range("A1").Resize(k, UBound(Arr2, 2)) = Arr2

这是我的代码

Sub convertExcel()
Dim Arr1, Arr2()
Dim Rnum As Integer, Cnum As Integer, Tnum As Integer
Dim i As Integer, j As Integer, k As Integer
'Rnum = row number; Cnum = column number; Tnum as  total number

Application.ScreenUpdating = False
Set w = Workbooks.Open("FileNAME~~~~")
Rnum = w.Sheets("Data").Cells(Rows.Count, 1).End(xlUp).Row

Cnum=208
Tnum = Rnum * Cnum / 2
w.Sheets.Add.Name = "DataSort"

Arr1 = Range("A1:GZ" & Rnum)
ReDim Arr2(1 To Tnum, 1 To 3)

For j = 2 To Cnum
  If w.Sheets("Data").Cells(1, j) = "Date" Then
     For i = 2 To Rnum
    If Arr1(i, j) <> "" Then
        k = k + 1:
        Arr2(k, 1) = Arr1(i, j)
        Arr2(k, 2) = Arr1(1, j)
        Arr2(k, 3) = Arr1(i, j + 1)
    End If
    Next
    End If
Next


w.Sheets("DataSort").Range("A1").Resize(k, UBound(Arr2, 2)) = Arr2

w.Close True
Application.ScreenUpdating = True
End Sub

1 个答案:

答案 0 :(得分:1)

由于今天我手上有很多时间,所以我决定在这里投入一些时间。我发现它有点挑战性,但最后,它只是恰当的事件排序。

以下是我采用的逻辑:

  • 删除所有非Date和非MeasureX列。
  • 将所有带有Measure的列名存储在字典中(完全不必要,但是,嘿,它很快)作为键。
  • 迭代第一个字典的键并创建第二个字典,将日期 - 值对存储为键值对。
  • 每次迭代,我们都会在第二张纸上打印出键值对。

请仔细阅读代码中的所有评论。另外,请注意我的设置如下。最后,在您的工作簿副本上进行测试。

设置向上:

Sheet2中,我有一个 unabridged 数据集,大致从您的示例中复制了1508列和1500行数据,不包括标题。删除不需要的列后,数据将减少到734列和1500行数据。在测试时,删除大约需要11-13秒。您的里程可能会有所不同。

使用这个过滤后的数据,使用第二个字典处理它需要大约8-9秒才能完成。整个过程基本上完成约20秒。

<强>截图:

Sheet2(包含原始数据的工作表):

enter image description here

Sheet3(输出表):

enter image description here

<强> 代码:

Sub KamehameWave()

    Dim Sht2 As Worksheet, Sht3 As Worksheet
    Dim Dict As Object, Cell As Range
    Dim Dict2 As Object, Cell2 As Range
    Dim RngToDelete As Range

    Set Sht2 = ThisWorkbook.Sheets("Sheet2") 'Modify accordingly.
    Set Sht3 = ThisWorkbook.Sheets("Sheet3") 'Modify accordingly.

    Application.ScreenUpdating = False

    With Sht2
        '-----------------------------------BK201's Notes-----------------------------------'
        ' The following block will delete unneeded columns. Basically, it will only keep    '
        ' columns that either have "Date" or "MeasureX" in their headers. All else will be  '
        ' discarded. As said in the post, do this on a copy of your worksheet.              '
        '-----------------------------------BK201's Notes-----------------------------------'
        Start = Timer()
        For Each Cell In .Rows(1).Cells
            If InStr(1, Cell.Value, "Date") = 0 And InStr(1, Cell.Value, "Measure") = 0 Then
                If Not RngToDelete Is Nothing Then
                    Set RngToDelete = Union(RngToDelete, .Columns(Cell.Column))
                Else
                    Set RngToDelete = .Columns(Cell.Column)
                End If
            End If
        Next Cell
        RngToDelete.Delete
        Debug.Print Timer() - Start
        Start = Timer()
        '-----------------------------------BK201's Notes-----------------------------------'
        ' The following block will create a dictionary and store all the names of columns   '
        ' with "Measure" in them. This is just so you have a reference. An array or a       '
        ' collection will do as well. I prefer to use this though as I find it easier.      '
        '-----------------------------------BK201's Notes-----------------------------------'
        Set Dict = CreateObject("Scripting.Dictionary")
        For Each Cell In .Rows(1).Cells
            CheckIfMeasure = InStr(1, Cell.Value, "Measure")
            If CheckIfMeasure > 0 Then
                If Not Dict.Exists(Cell.Value) And Not IsEmpty(Cell.Value) Then
                    Dict.Add Cell.Value, Empty
                End If
            End If
        Next Cell
        '-----------------------------------BK201's Notes-----------------------------------'
        ' What we'll do next is to iterate over each "MeasureX" column. We'll iterate over  '
        ' the values on these columns and store them in a *second* dictionary, with their   '
        ' respective dates being the keys.                                                  '
        '-----------------------------------BK201's Notes-----------------------------------'
        For Each Key In Dict
            MColIndex = Application.Match(Key, .Rows(1), 0)
            MColLRow = .Cells(Rows.Count, MColIndex).End(xlUp).Row
            Set MCol = .Range(.Cells(2, MColIndex), .Cells(MColLRow, MColIndex))
            Set Dict2 = CreateObject("Scripting.Dictionary")
            For Each Cell2 In MCol
                If Not Dict2.Exists(Cell2.Value) And Not IsEmpty(Cell2.Value) Then
                    Dict2.Add Cell2.Offset(0, -1).Value, Cell2.Value
                End If
            Next Cell2
        '-----------------------------------BK201's Notes-----------------------------------'
        ' The final phase is to get the next empty row in the output sheet and dump all the '
        ' key-value pairs from our second dictionary there. Since we're also iterating      '
        ' through the keys of the first dictionary, the list will append properly to        '
        ' accommodate each key's own dictionary.                                            '
        '-----------------------------------BK201's Notes-----------------------------------'
            TColNRow = Sht3.Range("A" & Rows.Count).End(xlUp).Row + 1
            Sht3.Range("A" & TColNRow).Resize(Dict2.Count, 1).Value = Application.Transpose(Dict2.Keys)
            Sht3.Range("B" & TColNRow).Resize(Dict2.Count, 1).Value = Key
            Sht3.Range("C" & TColNRow).Resize(Dict2.Count, 1).Value = Application.Transpose(Dict2.Items)
        Next Key
        Debug.Print Timer() - Start
    End With

    Application.ScreenUpdating = True

End Sub

运行代码后的结果:

enter image description here

enter image description here

第一个数字是删除的运行时间,第二个是转置的运行时间。考虑到我有50万个数据点,这还不错。排序数据在你的法庭上。

如果有帮助,请告诉我们。