是否有更有效的方法来比较字符串

时间:2016-07-17 12:11:14

标签: vb.net

编辑:我注意到,当我按照下面的建议进行一些更改时,程序只使用12%的CPU,几乎没有任何读/写但仍然有点慢,我使用线程一次运行4个文件的程序,略低于4倍的工作和使用62%的CPU,我确实有一个100毫秒的计时器更新进度条和标签。这可能会影响到某些方面的表现吗?这个特殊程序只执行这一项任务,因此有一个标签,计时器和一个进度条,每次文件到达时都会执行它。

我在文件中有655,000个“单词”。我想交叉引用用户提供的“单词”,看看我是否可以在文件中找到匹配项。

目前,我只是打开文件,逐行读取,检查值是否相同。

但是这需要很长时间来浏览文件。

有更快的方法进行比较吗?我应该阅读整个文件,然后拆分并比较吗?

我试图“索引”word文件,但这也需要永远。代码

我在一个单独的线程中运行它

文件增长得非常快,两个小时前是10,000“单词”我会认为它会进入百万分之10

我使用术语“单词”,因为该文件包含来自我的第一个神经网络AI的数据,所以不幸的是,引用单词搜索不起作用。

Do While sr.Peek() >= 0
        NewWord = (sr.ReadLine())
        FirstLetter = NewWord(0)
        Wordlength = NewWord.Length

        If Wordlength < 5 Then
            writefile = "5.txt"
        End If
        If Wordlength = 6 Then
            writefile = "6.txt"
        End If
        If Wordlength = 7 Then
            writefile = "7.txt"
        End If
        If Wordlength = 8 Then
            writefile = "8.txt"
        End If
        If Wordlength = 9 Then
            writefile = "9.txt"
        End If
        If Wordlength = 10 Then
            writefile = "10.txt"
        End If
        If Wordlength = 11 Then
            writefile = "11.txt"
        End If
        If Wordlength >= 12 Then
            writefile = "12.txt"
        End If


        If LCase(FirstLetter) = "a" Then
            Writepath = "H:\Dictionary\A\"
        End If
        If LCase(FirstLetter) = "b" Then
            Writepath = "H:\Dictionary\B\"
        End If
        If LCase(FirstLetter) = "c" Then
            Writepath = "H:\Dictionary\C\"
        End If
        If LCase(FirstLetter) = "d" Then
            Writepath = "H:\Dictionary\D\"
        End If
        If LCase(FirstLetter) = "e" Then
            Writepath = "H:\Dictionary\E\"
        End If
        If LCase(FirstLetter) = "f" Then
            Writepath = "H:\Dictionary\F\"
        End If
        If LCase(FirstLetter) = "g" Then
            Writepath = "H:\Dictionary\G\"
        End If
        If LCase(FirstLetter) = "h" Then
            Writepath = "H:\Dictionary\H\"
        End If
        If LCase(FirstLetter) = "i" Then
            Writepath = "H:\Dictionary\I\"
        End If
        If LCase(FirstLetter) = "j" Then
            Writepath = "H:\Dictionary\J\"
        End If
        If LCase(FirstLetter) = "k" Then
            Writepath = "H:\Dictionary\K\"
        End If
        If LCase(FirstLetter) = "l" Then
            Writepath = "H:\Dictionary\L\"
        End If
        If LCase(FirstLetter) = "m" Then
            Writepath = "H:\Dictionary\M\"
        End If
        If LCase(FirstLetter) = "n" Then
            Writepath = "H:\Dictionary\N\"
        End If
        If LCase(FirstLetter) = "o" Then
            Writepath = "H:\Dictionary\O\"
        End If
        If LCase(FirstLetter) = "p" Then
            Writepath = "H:\Dictionary\P\"
        End If
        If LCase(FirstLetter) = "q" Then
            Writepath = "H:\Dictionary\Q\"
        End If
        If LCase(FirstLetter) = "r" Then
            Writepath = "H:\Dictionary\R\"
        End If
        If LCase(FirstLetter) = "s" Then
            Writepath = "H:\Dictionary\S\"
        End If
        If LCase(FirstLetter) = "t" Then
            Writepath = "H:\Dictionary\T\"
        End If
        If LCase(FirstLetter) = "u" Then
            Writepath = "H:\Dictionary\U\"
        End If
        If LCase(FirstLetter) = "v" Then
            Writepath = "H:\Dictionary\V\"
        End If
        If LCase(FirstLetter) = "w" Then
            Writepath = "H:\Dictionary\W\"
        End If
        If LCase(FirstLetter) = "x" Then
            Writepath = "H:\Dictionary\X\"
        End If
        If LCase(FirstLetter) = "y" Then
            Writepath = "H:\Dictionary\Y\"
        End If
        If LCase(FirstLetter) = "z" Then
            Writepath = "H:\Dictionary\Z\"
        End If

        outputpath = Writepath & writefile



        Using sw As StreamWriter = File.AppendText(outputpath)
            sw.WriteLine(NewWord)
        End Using
        progressvalue = progressvalue + 1
    Loop

1 个答案:

答案 0 :(得分:1)

散列数据结构(例如.NET中的HashSet)将是添加和检查单词的最快方法,但是当您添加更多单词时,最终会耗尽内存。

数据库应该是最好的,因为单词将被编入索引,您可以从多台计算机访问它。

使用文件系统很可能是最慢的方式,但我猜测使用文件夹名称而不是文件应该更快。例如,对于单词Foo,路径将为"H:\Dictionary\F\O\O\"(大写或小写在我所知道的大多数流行文件系统上无关紧要),但它也将使用更多空间作为每个文件夹将有单独的元数据信息和设置。

如果项目有一些预算,您可以搜索更好的解决方案,例如Google BigQuery。