正则表达式搜索二进制文件

时间:2018-03-09 14:55:26

标签: regex excel vba framemaker

我尝试编写一个Excel VBA脚本,该脚本从二进制FrameMaker文件(* .fm)中获取一些信息(版本和修订日期)。

在sub打开* .fm文件后,将前25行(所需信息在前25行中)写入变量。

Sub fetchDate()
    Dim fso As Object
    Dim fmFile As Object

    Dim fileString As String
    Dim fileName As String
    Dim matchPattern As String
    Dim result As String
    Dim i As Integer
    Dim bufferString As String

    Set fso = CreateObject("Scripting.FileSystemObject")

    fileName = "C:\FrameMaker-file.fm"

    Set fmFile = fso.OpenTextFile(fileName, ForReading, False, TristateFalse)
    matchPattern = "Version - Date.+?(\d{1,2})[\s\S]*Rev.+?(\d{1,2})"

    fileString = ""
    i = 1
    Do While i <= 25
        bufferString = fmFile.ReadLine
        fileString = fileString & bufferString & vbNewLine
        i = i + 1
    Loop
    fmFile.Close

    'fileString = Replace(fileString, matchPattern, "")
    result = regExSearch(fileString, matchPattern)

    MsgBox result

    Set fso = Nothing
    Set fmFile = Nothing
End Sub

正则表达式函数如下所示:

Function regExSearch(ByVal strInput As String, ByVal strPattern As String) As String
    Dim regEx As New RegExp

    Dim strReplace As String
    Dim result As String
    Dim match As Variant
    Dim matches As Variant
    Dim subMatch As Variant

    Set regEx = CreateObject("VBScript.RegExp")

    If strPattern <> "" Then
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With

        If regEx.test(strInput) Then
            Set matches = regEx.Execute(strPattern)

            For Each match In matches
                If match.SubMatches.Count > 0 Then
                    For Each subMatch In match.SubMatches
                        Debug.Print "match:" & subMatch
                    Next subMatch
                End If
            Next match

            regExSearch = result
        Else
            regExSearch = "no match"
        End If
    End If

    Set regEx = Nothing
End Function

问题1:

保存在变量“fileString”中的二进制* .fm文件的内容在每次运行时都有所不同,尽管* .fm文件保持不变。

以下是来自不同运行的前三行的一些示例,这些行保存在“fileString”中:

示例1

<MakerFile 12.0>


Aaÿ No.009.xxx  ????          /tEXt     ??????

示例2

<MakerFile 12.0>


Aaÿ  `      ? ????          /tEXt ?     c ? E     ? ????a A ? ?      ? ? ? d??????? ?        Heading ????????????A???????A

正如您所看到的,示例1与示例2不同,尽管它是完全相同的VBA代码和完全相同的* .fm文件。

问题2:

“matchPattern”中的正则表达式搜索字符串随机写入我的“fileString”也是一个大问题。以下是调试控制台的屏幕截图:

parts of value of matchPattern

这怎么可能?有任何建议或想法来解决这个问题吗?

我正在使用:

MS Office Professional Plus 2010

正则表达式的VBA参考:Microsoft VBScript正则表达式5.5

非常感谢你!

此致 安迪

/编辑2018年3月12日:

以下是* .fm文件示例:sample file 如果你用记事本打开它,你可以用纯文本看到一些信息,如“版本 - DateVersion 4 - 2018 / Feb / 07”和“Rev02 - 2018 / Feb / 21”。我想用正则表达式获取这些信息。

2 个答案:

答案 0 :(得分:1)

我找到了使用ADODB.streams的解决方案。这很好用:

Sub test_binary()
    Dim regEx As Object

    Dim buffer As String
    Dim filename As String
    Dim matchPattern As String
    Dim result As String

    Set regEx = CreateObject("VBScript.RegExp")

    filename = "C:\test.fm"

    With CreateObject("ADODB.Stream")
        .Open
        .Type = 2
        .Charset = "utf-8"
        .LoadFromFile filename
        buffer = .Readtext(10000)
        .Close
    End With

    matchPattern = "Version - Date.+?(\d{1,2})[\s\S]*Rev.+?(\d{1,2})"

    result = regExSearch(buffer, matchPattern)

    MsgBox result
End Sub

正则表达式功能:

Function regExSearch(ByVal strInput As String, ByVal strPattern As String) As String
    Dim regEx As New RegExp

    Dim result As String
    Dim match As Variant
    Dim matches As Variant
    Dim subMatch As Variant

    Set regEx = CreateObject("VBScript.RegExp")

    If strPattern <> "" Then
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With

        If regEx.test(strInput) Then
            Set matches = regEx.Execute(strInput)

            result = ""
            For Each match In matches
                If match.SubMatches.Count > 0 Then
                    For Each subMatch In match.SubMatches
                        If Len(result) > 0 Then
                            result = result & "||"
                        End If
                        result = result & subMatch
                    Next subMatch
                End If
            Next match

            regExSearch = result
        Else
            regExSearch = "err_nomatch"
        End If
    End If

    Set regEx = Nothing
End Function

将* .fm文件作为文本文件(.Type = 2)打开并将字符集设置为&#34; utf-8&#34;非常重要。否则我的正则表达式不会有纯文本阅读。

非常感谢你带我走正确的路!

答案 1 :(得分:0)

只需将FM文件另存为MIF。 它是FM文件的文本编码,可以在不损失任何信息的情况下来回转换。