过滤列表的算法

时间:2018-05-12 08:52:28

标签: vba algorithm arraylist filter time-complexity

我已经实现了我认为在VBA中过滤System.Collections.ArrayList的非常垃圾方法。代码采用列表和项目/比较值来过滤掉。它遍历列表并删除匹配的项目。然后它会重新启动循环(因为您可以同时For Each.Remove

Public Sub Filter(ByVal testValue As Object, ByVal dataSet As ArrayList)
'testValue and the items in `dataSet` all Implement IComparable from mscorlib.dll
'This allows comparing objects for equality
'i.e. obj1.CompareTo(obj2) = 0 is equivalent to obj1 = obj2
    Dim item As IComparable
    Dim repeat As Boolean
    repeat = False
    For Each item In dataSet
        If item.CompareTo(testValue) = 0 Then   'or equiv; If item = testValue
            dataSet.Remove item
            repeat = True
            Exit For
        End If
    Next item
    If repeat Then Filter testValue, dataSet 
End Sub

为什么垃圾

我们假设该列表的长度为X个元素,并且包含符合过滤条件的Y个项X>Y。据我所知,最好的案例表现是O(X),当所有Y s在开始时聚集在一起。最糟糕的情况是所有Y s在最后聚集。在这种情况下,算法需要(X-Y)*Y个查找操作,最多Y=X/2,因此O(X^2)

与简单的O(X)算法相比,这很差,当你到达匹配时,踩踏和删除,但没有打破循环。但我不知道如何实现它。 有没有办法提高此过滤器的性能?

1 个答案:

答案 0 :(得分:2)

你不能做以下的事情,我相信是O(n):

Option Explicit

Public Sub RemItems()

    Const TARGET_VALUE As String = "dd"
    Dim myList As Object
    Set myList = CreateObject("System.Collections.ArrayList")

    myList.Add "a"
    myList.Add "dd"
    myList.Add "a"
    myList.Add "a"
    myList.Add "a"
    myList.Add "dd"
    myList.Add "a"
    myList.Add "a"
    myList.Add "dd"
    myList.Add "a"
    myList.Add "a"

    Dim i As Long
    For i = myList.Count - 1 To 0 Step -1
        If myList(i) = TARGET_VALUE Then myList.Remove myList(i)
    Next i

End Sub

有关复杂性信息,请参阅此讨论:

Asymptotic complexity of .NET collection classes

如果要相信this(.NET-Big-O-Algorithm-Complexity-Cheat-Sheet):

enter image description here

注意:我使用https://htmledit.squarefree.com/

呈现了HTML

修改

警告 - 我不是CS毕业生。这玩得很开心。我确信有关于正在处理的数据类型,发行版等的争论......欢迎改进

上面的.Net表显示删除 HashTable 平均O(1)进行删除,而对于ArrayList则为O(n),因此我从值{{1}中随机生成了100,000行}}。然后我将其用作我的固定测试集​​,以获得以下结果。

Runs

Test set proportions

测试运行代码(请温柔!)

{"a","b","c"}

以上看似是0(1)。

只需查看删除过程(删除其他因素),结果就不那么确凿了,但同样,我的编码可能是一个因素!

Deletion run

修改后的代码(删除其他因素):

Option Explicit

Private Declare PtrSafe Function getFrequency Lib "kernel32" _
Alias "QueryPerformanceFrequency" (cyFrequency As Currency) As Long
Private Declare PtrSafe Function getTickCount Lib "kernel32" _
Alias "QueryPerformanceCounter" (cyTickCount As Currency) As Long

Public Sub TestingArrayList()
    Const TARGET_VALUE = "a"
    Dim aList As Object
    Set aList = CreateObject("System.Collections.ArrayList")

    Dim arr()
    arr = ThisWorkbook.Worksheets("Sheet1").Range("A1").CurrentRegion.Value '<== Reads in 100000 value

    Dim i As Long
    For i = 1 To UBound(arr, 1) '50000
        aList.Add arr(i, 2)
    Next i

    Debug.Print aList.Contains(TARGET_VALUE)

    Dim StartTime As Double

    StartTime = MicroTimer()

    For i = aList.Count - 1 To 0 Step -1
       If aList(i) = TARGET_VALUE Then aList.Remove aList(i)
    Next i

    Debug.Print "Removal from array list took: " & Round(MicroTimer - StartTime, 3) & " seconds"
    Debug.Print aList.Contains(TARGET_VALUE)

End Sub

Public Sub TestingHashTable()
    Const TARGET_VALUE = "a"
    Dim hTable As Object
    Set hTable = CreateObject("System.Collections.HashTable")

    Dim arr()
    arr = ThisWorkbook.Worksheets("Sheet1").Range("A1").CurrentRegion.Value '<== Reads in 100000 value

    Dim i As Long
    For i = 1 To UBound(arr, 1) '50000
        hTable.Add i, arr(i, 2)
    Next i

    Debug.Print hTable.ContainsValue(TARGET_VALUE)

    Dim StartTime As Double

    StartTime = MicroTimer()

    For i = hTable.Count To 1 Step -1
       If hTable(i) = TARGET_VALUE Then hTable.Remove i
    Next i

    Debug.Print "Removal from hash table took: " & Round(MicroTimer - StartTime, 3) & " seconds"
    Debug.Print hTable.ContainsValue(TARGET_VALUE)

End Sub

Public Function MicroTimer() As Double

    Dim cyTicks1 As Currency
    Static cyFrequency As Currency

    MicroTimer = 0

    If cyFrequency = 0 Then getFrequency cyFrequency

    getTickCount cyTicks1

    If cyFrequency Then MicroTimer = cyTicks1 / cyFrequency
End Function