有没有更快的方法REGEX排序列表?

时间:2015-11-06 21:18:17

标签: .net regex vb.net linq

voGenderList是一个sortedlist(Of string,string),包含30,000个名称(Unique Name,Gender)的列表。我想知道该列表中有多少名字包含" henry"即(" henry"," henryette"," henryate"," ihenry")如果在列表中,将全部匹配。

Dim matchlist As Dictionary(Of String,String)= voGenderList.Where(Function(i)New Regex(i.Key).IsMatch(" henry"))。ToDictionary(Function( i)i.Key,(功能(i)i.Value))

Dim namelist As List(Of String) 'Contains 35000 unique names
Dim matchlist As Dictionary(Of String, String)

For Each oItem In namelist 
        matchlist = voGenderList.Where(Function(i) oItem.IndexOf(i.Key) >= 0).ToDictionary(Function(i) i.Key, (Function(i) i.Value))
        'Do other stuff with the results of matchlist
Next

上面的代码工作已被下面的建议所取代,它比正则表达式匹配的速度要快得多。 - 将上述代码循环35,000次大约需要5分钟,这是一个很大的改进。

.contains和.IndexOf接近大约相同的速度,IndexOf逐渐消失。

我现在对结果感到满意,但如果有人有其他建议要进一步改进,我会倾听。

1 个答案:

答案 0 :(得分:1)

并行方法是要走的路。 你有voGenderList中的m键和名单中的n键,因此有效地进行n * m次迭代和迭代之间你不必共享任何状态,所以它本质上是并行问题。

请注意,voGenderList被转换为voGenderArray()以进一步提高速度,因为FOR循环总是比多次枚举多次项目的集合更快。

我认为,你至少有两个核心,如果没有,它将按顺序运行,并且由于FOR循环而没有加速,而是枚举。应该对35k项目产生可衡量的影响

我无法测试代码,但它编译。 VB.Net不是我的语言,但请相信我,这个概念是合理的:)。如果有任何问题,我会修复......

Imports System.Linq
Imports System.Collections.Concurrent

Module Module1

    Sub Main()
        'Somewhere defined
        Dim voGenderList As SortedList(Of String, String)
        Dim namelist As List(Of String)
        voGenderList = New SortedList(Of String, String)
        namelist = New List(Of String)

        'ConcurrentDictionary allows concurrent update of dictionary,
        ' names are unique, but they have to be inserted into dictionary,
        ' here concurrently
        Dim matchlist As ConcurrentDictionary(Of String, Dictionary(Of String, String))
        matchlist = New ConcurrentDictionary(Of String, Dictionary(Of String, String))


        'SortedList is fine, but creating enumerator over list and enumerating it again and again is bad, array is better and FOR is faster
        Dim voGenderArray() As KeyValuePair(Of String, String) = voGenderList.ToArray()

        'Paralle computing
        namelist.AsParallel().ForAll(Sub(match) ParallelPart(voGenderArray, matchlist, match))

        'do something with matchlist, sequentially, concurrently as you see fit:)


    End Sub

    Sub ParallelPart(ByRef voGenderArray() As KeyValuePair(Of String, String),
                     ByRef matchlist As ConcurrentDictionary(Of String, Dictionary(Of String, String)),
                     ByRef match As String)
        If (voGenderArray Is Nothing) Or (voGenderArray.Length <= 0) Then
            Exit Sub
        End If
        Dim dictionary As Dictionary(Of String, String) = Nothing
        Dim size As Integer = voGenderArray.Length
        For i As Integer = 0 To size - 1
            Dim kvp As KeyValuePair(Of String, String) = voGenderArray(i)
            If match.IndexOf(kvp.Key) >= 0 Then
                If dictionary Is Nothing Then
                    dictionary = New Dictionary(Of String, String)
                End If
                dictionary.Add(kvp.Key, kvp.Value)
            End If
        Next

        If Not (dictionary Is Nothing) Then
            matchlist.TryAdd(match, dictionary)
        End If
    End Sub


End Module
相关问题