按最短前缀分组的组号

时间:2016-11-03 15:59:51

标签: vba

我知道这不是本网站的理想问题,但是根据社区维基(这里:https://meta.stackexchange.com/questions/129598/which-computer-science-programming-stack-exchange-do-i-post-in)中列出的指南,我认为它符合算法的要求。如果移动位置不好,请标记移动,或者评论,我会适当删除。

我有一个数字列表,我需要用尽可能短的常用起始编号进行分组。

在下面的示例中,所有数字都可以按12分组,因为12后面的任何数字都将由CompanyA拥有:

120  CompanyA
121  CompanyA
122  CompanyA
123  CompanyA
124  CompanyA
125  CompanyA
126  CompanyA
127  CompanyA
128  CompanyA
129  CompanyA

为我的数据提供更真实的样本(数字在3到6位数之间):

3734 CompanyA
3735 CompanyA
375  CompanyA
3760 CompanyA
3761 CompanyA
3762 CompanyA
3763 CompanyA
3764 CompanyA
3765 CompanyA
3766 CompanyA
3767 CompanyA
3768 CompanyA
3769 CompanyA
3770 CompanyA
3771 CompanyA
3773   CompanyB
3774   CompanyB
3775   CompanyB
3776   CompanyB
3778   CompanyB
33045  CompanyB
361    CompanyB

应该成为:

3734 CompanyA
3735 CompanyA
375  CompanyA
376  CompanyA  'All numbers from 3760 to 3769 have been condensed to 1 number
3770  CompanyA
3771  CompanyA
3773   CompanyB
3774   CompanyB
3775   CompanyB
3776   CompanyB
3778   CompanyB
33045  CompanyB
361    CompanyB

这是一个必须渗透到多个行业的问题,我想有一种算法,我可以适应VBA而不会有太大的困难。然而,我正在努力克服逻辑。

如果有人能指出我正确的方向,我们将不胜感激。如果有人能指出我正确的方向,我很乐意适应并在VBA中发布答案,不幸的是我的谷歌搜索能力让我失望。

2 个答案:

答案 0 :(得分:2)

所以它花了比我预期的更多的时间,但它在这里!如果您以前没有使用过Tries,我建议reading the Wikipedia article。基本上,树中的每个级别代表数字的字符。当通过树到达数字的末尾时,它是一个叶子,这是存储值(公司名称)的地方。不可否认,我做了很多评论代码的工作,所以如果你有什么特别想知道,评论,我可以扩展它。

首先,像这样创建一个clsTrieNode类:

Option Explicit

Public parent As clsTrieNode
Public value As String
Public count As Long
Public digit As String

'Arrays are not allowed to be public members of classes, sadly
Private m_children(0 To 9) As clsTrieNode

Public Property Get children(i As Byte) As clsTrieNode
    Set children = m_children(i)
End Property

Public Property Set children(i As Byte, node As clsTrieNode)
    Set m_children(i) = node
End Property

接下来,像这样创建clsNumberTrie

Option Explicit

Private head As clsTrieNode

Private Sub Class_Initialize()
    Set head = New clsTrieNode
End Sub

Public Sub Add(key As String, value As String)
    Dim temp As clsTrieNode
    Set temp = head
    Dim i As Long
    Dim key_digit As Byte
    For i = 1 To Len(key)
        key_digit = Val(Mid(key, i, 1))
        If Not temp.children(key_digit) Is Nothing Then
            Set temp = temp.children(key_digit)
        Else
            Set temp.children(key_digit) = New clsTrieNode
            Set temp.children(key_digit).parent = temp
            Set temp = temp.children(key_digit)
            temp.digit = key_digit 'implicit string conversion
        End If
    Next
    temp.value = value
    mergeTrieUpwards temp.parent
End Sub

Private Sub mergeTrieUpwards(node As clsTrieNode)
    If isMergeable(node) Then
        node.value = node.children(0).value
        Dim i As Byte
        For i = 0 To 9
            Set node.children(i) = Nothing
        Next
        mergeTrieUpwards node.parent
    End If
End Sub

Private Function isMergeable(node As clsTrieNode) As Boolean
    Dim i As Byte
    'Firstly, node must be defined (e.g., not the parent of head)
    If node Is Nothing Then
        isMergeable = False
        Exit Function
    End If

    For i = 0 To 9
        'Secondly, all children must be defined
        If node.children(i) Is Nothing Then
            isMergeable = False
            Exit Function
        'Thirdly, all children must be leaves
        ElseIf node.children(i).value = "" Then
            isMergeable = False
            Exit Function
        End If
    Next
    isMergeable = True
End Function

Public Function toString() As String
    Dim strKey As String
    Dim strOutput As String
    If Not head Is Nothing Then
        strOutput = toStringRecurse("", head)
    End If
    toString = strOutput
End Function

Private Function toStringRecurse(prefix As String, node As clsTrieNode) As String
    Dim strOutput As String
    Dim i As Byte
    If node.value <> "" Then
        toStringRecurse = prefix & node.digit & " " & node.value & vbCrLf
        Exit Function
    Else
        For i = 0 To 9
            If Not node.children(i) Is Nothing Then
                strOutput = strOutput & toStringRecurse(prefix & node.digit, node.children(i))
            End If
        Next
        toStringRecurse = strOutput
    End If
End Function

最后,要根据您的输入数字运行它,我在名为mdlMain的模块中有以下内容。我推出了自己的Split,因为内置的split不支持忽略连续的分隔符,并且你的输入有可变数量的空格。

Public Sub Main()
    Dim input_data As String
    input_data = "3734   CompanyA" & vbCrLf & _
                 "3735   CompanyA" & vbCrLf & _
                 "375    CompanyA" & vbCrLf & _
                 "3760   CompanyA" & vbCrLf & _
                 "3761   CompanyA" & vbCrLf & _
                 "3762   CompanyA" & vbCrLf & _
                 "3763   CompanyA" & vbCrLf & _
                 "3764   CompanyA" & vbCrLf & _
                 "3765   CompanyA" & vbCrLf & _
                 "3766   CompanyA" & vbCrLf & _
                 "3767   CompanyA" & vbCrLf & _
                 "3768   CompanyA" & vbCrLf & _
                 "3769   CompanyA" & vbCrLf & _
                 "3770   CompanyA" & vbCrLf & _
                 "3771   CompanyA" & vbCrLf & _
                 "3773   CompanyB" & vbCrLf & _
                 "3774   CompanyB" & vbCrLf & _
                 "3775   CompanyB" & vbCrLf & _
                 "3776   CompanyB" & vbCrLf & _
                 "3778   CompanyB" & vbCrLf & _
                 "33045  CompanyB" & vbCrLf & _
                 "361    CompanyB"

    Dim companyTrie As clsNumberTrie
    Set companyTrie = New clsNumberTrie

    Dim rows As Variant
    Dim row As Variant

    rows = SplitStr(input_data, vbCrLf)

    Dim i As Long
    For i = 0 To UBound(rows)
        row = SplitStr(CStr(rows(i)), " ", True)
        companyTrie.Add CStr(row(0)), CStr(row(1))
    Next

    Debug.Print companyTrie.toString

End Sub

'This implementation of split has supports ignoring consecutive delimiters
Public Function SplitStr(str As String, delim As String, Optional treatSuccessiveDelimitersAsOne = False) As Variant
    'This is not an optimal implementation:
    '1. Resizing an array is expensive because it requires copying the whole thing.
    '2. String concatenation has the same problem; new memory is allocated to hold the result, and then both strings are copied to this new location.
    'Thankfully, with the small strings in this example, it doesn't matter too much.

    Dim i As Long
    Dim outArr() As String

    ReDim outArr(0 To 0)

    'Iterate through the string
    For i = 1 To Len(str)
        'If the current character is the start of the delimiter...
        If Mid(str, i, 1) = Mid(delim, 1, 1) Then
            'Check and see if the whole delimiter is there...
            If isSubstringDelim(str, i, delim) Then
                'If so, jump i past the delimiter and add a new slot to the split array
                i = i + Len(delim)
                ReDim Preserve outArr(0 To (UBound(outArr) + 1))
                'Check to see if there are multiple delimiters in a row...
                While isSubstringDelim(str, i, delim)
                    i = i + Len(delim)
                    'If treatSuccessiveDelimitersAsOne is False, we add a blank string to the split array each time we encounter a successive delimiter.
                    'If it's true, just consume the delimiters without creating a blank string
                    If Not treatSuccessiveDelimitersAsOne Then
                        ReDim Preserve outArr(0 To (UBound(outArr) + 1))
                    End If
                Wend
            End If
        End If
        'Add the current character to the current slot of the split array
        outArr(UBound(outArr)) = outArr(UBound(outArr)) + Mid(str, i, 1)
    Next

    SplitStr = outArr
End Function

Private Function isSubstringDelim(str, index, delim) As Boolean
    Dim min As Long
    If (Len(str) - index) < Len(delim) Then
        isSubstringDelim = False
        Exit Function
    End If
    For i = 1 To Len(delim)
        If Not (Mid(str, i + index - 1, 1) = Mid(delim, i, 1)) Then
            isSubstringDelim = False
            Exit Function
        End If
    Next
    isSubstringDelim = True
End Function

由于访问节点的方式,结果按字母顺序输出。请注意,它支持递归分组,因此如果您为CompanyA提供了3351到3358,但是对于CompanyA,您还有33591到33599,它将首先累计3359,然后汇总335。

33045 CompanyB
361公司B
3734公司A 3735 CompanyA
375 CompanyA
376公司A 3770公司A
3771公司A 3773公司B
3774公司B
3775公司B
3776公司B
3778 CompanyB

答案 1 :(得分:0)

您可以测试字符串中字符的位置,因此如果您测试的是&#39; 37&#39;并且它出现在第一个位置,你的刺痛从37开始,你可以将它添加到你的列表,移动它,无论你想做什么。

If InStr(yourString,"37") < 2 Then 
    'do whatever 
End If

你可能需要使用确切的if语句和数字,这只是为了向你展示一般的想法。

抱歉,只需读到底部,看到有些人从37开始,但是不同的公司。对于那些我在嵌套if中以相同方式测试第3个字符并将它们拆分的人。