检查数组是否包含另一个数组

时间:2014-11-12 19:30:04

标签: arrays vb.net

我正在寻找一种方法来检查数组是否包含另一个数组的所有元素。 情况就是这样:我有两个字节数组Bytes():一个包含文件的字节,另一个包含要比较的字节。

例如,如果文件包含这些字节:4D 5A 90 00 03并且要比较的字符串是00 03,我希望函数返回true。否则它显然会返回虚假。因此,要比较的字符串中的所有字节也必须存在于文件中。

我已经在网上搜索了这个。尝试使用旧的Contains()函数,但对于数组,它仅用于比较单个字节。你知道,只有一个字节太少,无法识别文件!

如果可能的话,我想尽快做到这一点。

我正在使用 VB.NET WinForms ,VS 2013,.NET 4.5.1

提前致谢,

FWhite

修改

现在我有List(Of Bytes())这样:

00 25 85 69
00 41 52
00 78 96 32

这是三个Bytes()数组。如何检查我的文件字节数组是否包含所有这些值(该文件必须包含00 25 85 6900 41 5200 78 96 32?我试过这个代码,但它不起作用:

Dim BytesToCompare As List(Of Byte()) = StringToByteArray(S.Split(":")(3))
    For Each B As Byte() In BytesToCompare 
        If FileBytes.All(Function(c) B.Contains(c)) Then
            'Contains
            TempResults.Add("T")
        Else
            TempResults.Add("F")
        End If
    Next
If CountResults(TempResults) Then
    Return S
    Exit For
End If

CountResults中的代码是:

Public Function CountResults(Input As List(Of String)) As Boolean
    Dim TrueCount As Integer = 0
    Dim FalseCount As Integer = 0
    Dim TotalCount As Integer = Input.Count
    For Each S In Input
        If S = "T" Then
            TrueCount = TrueCount + 1
        ElseIf S = "F" Then
            FalseCount = FalseCount + 1
        End If
    Next
    If TrueCount = TotalCount Then
        Return True
    ElseIf FalseCount > TrueCount Then
        Return False
    End If
End Function

告诉我你是否理解,我会尽力解释。

谢谢,

FWhite

2 个答案:

答案 0 :(得分:1)

您可以使用All功能来检查。它返回一个布尔值。

Dim orgByteArray() As Byte = {CByte(1), CByte(2), CByte(3)}
Dim testByteArray() As Byte = {CByte(1), CByte(2)}
Dim result = orgByteArray.All(Function(b) testByteArray.Contains(b))
'output for this case returns False

用于将List(Of Byte())Byte()进行比较,其中Byte()List(Of byte())中所有子数组的complte列表。

Dim filebytes() As Byte = {CByte(1), CByte(2), CByte(3), CByte(3), CByte(4), CByte(5), CByte(6), CByte(7), CByte(8)}
Dim bytesToCheck As New List(Of Byte())
bytesToCheck.Add(New Byte() {CByte(1), CByte(2), CByte(3)})
bytesToCheck.Add(New Byte() {CByte(3), CByte(4), CByte(5)})
bytesToCheck.Add(New Byte() {CByte(6), CByte(7), CByte(8)})
Dim temp As New List(Of Byte)
Array.ForEach(bytesToCheck.ToArray, Sub(byteArray) Array.ForEach(byteArray, Sub(_byte) temp.Add(_byte)))
Dim result = filebytes.All(Function(_byte) temp.Contains(_byte))
'output = True

答案 1 :(得分:1)

我在想,除了蛮力方法之外,还有其他方法可行,并发现了Boyer-Moore搜索算法。无耻地将Boyer–Moore string search algorithm中的C和Java代码翻译成VB.NET,我到达了

Public Class BoyerMooreSearch

    ' from C and Java code at http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

    Private Shared Function SuffixLength(needle As Byte(), p As Integer) As Integer
        Dim len As Integer = 0
        Dim j = needle.Length - 1
        Dim i = 0
        While i >= 0 AndAlso needle(i) = needle(j)
            i -= 1
            j -= 1
            len += 1
        End While

        Return len

    End Function

    Private Shared Function GetOffsetTable(needle As Byte()) As Integer()
        Dim table(needle.Length - 1) As Integer
        Dim lastPrefixPosition = needle.Length
        For i = needle.Length - 1 To 0 Step -1
            If Isprefix(needle, i + 1) Then
                lastPrefixPosition = i + 1
            End If
            table(needle.Length - 1 - i) = lastPrefixPosition - i + needle.Length - 1
        Next
        For i = 0 To needle.Length - 2
            Dim slen = SuffixLength(needle, i)
            table(slen) = needle.Length - 1 - i + slen
        Next

        Return table

    End Function

    Private Shared Function Isprefix(needle As Byte(), p As Integer) As Boolean
        Dim j = 0
        For i = p To needle.Length - 1
            If needle(i) <> needle(j) Then
                Return False
            End If
            j += 1
        Next

        Return True

    End Function

    Private Shared Function GetCharTable(needle As Byte()) As Integer()
        Const ALPHABET_SIZE As Integer = 256
        Dim table(ALPHABET_SIZE - 1) As Integer
        For i = 0 To table.Length - 1
            table(i) = needle.Length
        Next
        For i = 0 To needle.Length - 2
            table(needle(i)) = needle.Length - 1 - i
        Next

        Return table

    End Function

    Shared Function IndexOf(haystack As Byte(), needle As Byte()) As Integer
        If needle.Length = 0 Then
            Return 0
        End If

        Dim charTable = GetCharTable(needle)
        Dim offsetTable = GetOffsetTable(needle)

        Dim i = needle.Length - 1
        While i < haystack.Length
            Dim j = needle.Length - 1
            While j >= 0 AndAlso haystack(i) = needle(j)
                i -= 1
                j -= 1
            End While
            If j < 0 Then
                Return i + 1
            End If

            i += Math.Max(offsetTable(needle.Length - 1 - j), charTable(haystack(i)))

        End While

        Return -1

    End Function

End Class

并测试它(怀疑@OneFineDay提供的LINQ代码会因性能而拆除它):

Imports System.IO
Imports System.Text

Module Module1

    Dim bytesToCheck As List(Of Byte())
    Dim rand As New Random

    Function GetTestByteArrays() As List(Of Byte())
        Dim testBytes As New List(Of Byte())
        ' N.B. adjust the numbers used in CreateTestFile according to the quantity (e.g. 10) of testData used
        For i = 1 To 10
            testBytes.Add(Encoding.ASCII.GetBytes("ABCDEFgfdhgf" & i.ToString() & "sdfgjdfjFGH"))
        Next
        Return testBytes
    End Function

    Sub CreateTestFile(f As String)
        ' Make a 4MB file of test data

        ' write a load of bytes which are not going to be in the
        ' judiciously chosen data to search for...
        Using fs As New FileStream(f, FileMode.Create, FileAccess.Write)
            For i = 0 To 2 ^ 22 - 1
                fs.WriteByte(CByte(rand.Next(128, 256)))
            Next
        End Using

        ' ... and put the known data into the test data
        Using fs As New FileStream(f, FileMode.Open)
            For i = 0 To bytesToCheck.Count - 1
                fs.Position = CLng(i * 2 ^ 18)
                fs.Write(bytesToCheck(i), 0, bytesToCheck(i).Length)
            Next
        End Using

    End Sub

    Sub Main()

        ' the byte sequences to be searched for
        bytesToCheckFor = GetTestByteArrays()

        ' Make a test file so that the data can be inspected
        Dim testFileName As String = "C:\temp\testbytes.dat"
        CreateTestFile(testFileName)

        Dim fileData = File.ReadAllBytes(testFileName)

        Dim sw As New Stopwatch
        Dim containsP As Boolean = True

        sw.Reset()
        sw.Start()
        For i = 0 To bytesToCheckFor.Count - 1
            If BoyerMooreSearch.IndexOf(fileData, bytesToCheckFor(i)) = -1 Then
                containsP = False
                Exit For
            End If
        Next

        sw.Stop()

        Console.WriteLine("Boyer-Moore: {0} in {1}", containsP, sw.ElapsedTicks)

        sw.Reset()
        sw.Start()
        Dim temp As New List(Of Byte)
        Array.ForEach(bytesToCheckFor.ToArray, Sub(byteArray) Array.ForEach(byteArray, Sub(_byte) temp.Add(_byte)))
        Dim result = fileData.All(Function(_byte) temp.Contains(_byte))
        sw.Stop()

        Console.WriteLine("LINQ: {0} in {1}", result, sw.ElapsedTicks)

        Console.ReadLine()

    End Sub

End Module

现在,我知道要匹配的字节序列在测试文件中(我确认通过使用十六进制编辑器来搜索它们)并且,假设(哦亲爱的!)我正在使用正确的另一种方法,后者不起作用,而我的确如此:

Boyer-Moore: True in 23913
LINQ: False in 3224

我还测试了OneFineDay的第一个代码示例,用于搜索要匹配的小型和大型模式,而少于七个或八个字节的代码比Boyer-Moore快。因此,如果你想关注你正在搜索的数据大小和你正在寻找的模式的大小,Boyer-Moore可能更适合你的“如果可能的话,我想尽可能快地做到这一点。“

修改

除了OP对我的建议方法是否有效的不确定性之外,这里是一个非常小的样本数据的测试:

Sub Test()
    bytesToCheckFor = New List(Of Byte())
    bytesToCheckFor.Add({0, 1}) ' the data to search for
    bytesToCheckFor.Add({1, 2})
    bytesToCheckFor.Add({0, 2})

    Dim fileData As Byte() = {0, 1, 2} ' the file data
    ' METHOD 1: Boyer-Moore
    Dim containsP As Boolean = True

    For i = 0 To bytesToCheckFor.Count - 1
        If BoyerMooreSearch.IndexOf(fileData, bytesToCheckFor(i)) = -1 Then
            containsP = False
            Exit For
        End If
    Next

    Console.WriteLine("Boyer-Moore: {0}", containsP)

    ' METHOD 2: LINQ
    Dim temp As New List(Of Byte)
    Array.ForEach(bytesToCheckFor.ToArray, Sub(byteArray) Array.ForEach(byteArray, Sub(_byte) temp.Add(_byte)))
    Dim result = fileData.All(Function(_byte) temp.Contains(_byte))

    Console.WriteLine("LINQ: {0}", result)

    Console.ReadLine()

End Sub

输出:

Boyer-Moore: False
LINQ: True

另外,我在原始Main()方法中重命名了变量,希望它们更有意义。