字符串与列-匹配百分比

时间:2018-07-31 04:29:33

标签: excel excel-vba excel-formula

试图找出一种将字符串与列进行比较时计算最小百分比匹配的方法。

示例:

Column A        Column B
Key             Keylime
Key Chain       Status
                Serious
                Extreme
                Key

哪里

Column A        Column B     Column C                 Column D
Key             Temp         100%                     Key
Key Chain       Status       66.7%                    Key Ch
Ten             Key Ch       100%                     Tenure
                Extreme       
                Key
                Tenure 

对此进行扩展:

  • A列是具有单独匹配字符串的列
  • 列B是参考列
  • C列提供A字符串与B列中任何字符串匹配的最高百分比。
  • D列提供B列中匹配百分比最高的单词

要在C列上展开-当查看Key Chain时-与B列中任何单词的最高匹配项是Key Ch,其中{{9个字符中的6个(包括空格) 1}}匹配,百分比匹配为(6/9)= 66.7%

  • 话虽这么说,这不是破坏交易的事情,但却是突出的事情。当您看不到发生Key Chain之类的示例时,如果无法对比赛进行惩罚,上述逻辑就会失败。 Ten的3个字符中有3个与Ten匹配,这给它带来了100%的夸张匹配,我仍然想不出一种纠正方法。

1 个答案:

答案 0 :(得分:1)

这应该可以工作(我尚未测试,目前在Linux上)。为每个字符串调用getStrMatch

Type StrMatch
    Percent As Double
    Word As String
End Type

Function getStrMatch(s As String, RefRange As Range) As StrMatch
    Dim i As Long, ref_str As String
    Dim BestMatch As StrMatch: BestMatch.Percent = -1
    Dim match_pc As Double
    With RefRange
        For i = 1 to .Cells.Count
            ref_str = .Cells(i).Value2
            match_pc = getMatchPc(s, ref_str)
            If match_pc > BestMatch.Percent Then
                BestMatch.Percent = match_pc
                BestMatch.Word = ref_str
            End If
        Next i
    End With
    getStrMatch = BestMatch
End Function

Function getMatchPc(s As String, ref_str As String) As Double
    Dim s_len As Long: s_len = Len(s)
    Dim ref_len As Long: ref_len = Len(ref_str) 
    Dim longer_len as Long
    If s_len > ref_len Then longer_len = s_len Else longer_len = ref_len
    Dim m As Long: m = 1
    While m <= longer_len
        If Mid(s, m, 1) <> Mid(ref_str, m, 1) Then Exit While
        m = m + 1
    Wend
    getMatchPc = (m - 1.0) / longer_len
End Function

请注意,您必须将其放入模块中,否则必须声明Private TypePrivate Function

此外,如果您要匹配很多字符串,则可能应该创建一个trie,因为这仅是幼稚的字符串比较,每个getStrMatch的成本为O(mn),其中m是RefRange的大小n是平均ref_str长度。