Excel VBA中的正则表达式

时间:2015-10-15 13:32:24

标签: regex excel vba excel-vba

我在Excel VBA中使用Microsoft正则表达式引擎。我对正则表达式很新,但我现在有一个模式。我需要扩展它,我遇到了麻烦。到目前为止,这是我的代码:

Sub ImportFromDTD()

Dim sDTDFile As Variant
Dim ffile As Long
Dim sLines() As String
Dim i As Long
Dim Reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim myRange As Range

Set Reg1 = New RegExp

ffile = FreeFile

sDTDFile = Application.GetOpenFilename("DTD Files,*.XML", , _
"Browse for file to be imported")

If sDTDFile = False Then Exit Sub '(user cancelled import file browser)


Open sDTDFile For Input Access Read As #ffile
  Lines = Split(Input$(LOF(ffile), #ffile), vbNewLine)
Close #ffile

Cells(1, 2) = "From DTD"
J = 2

For i = 0 To UBound(Lines)

  'Debug.Print "Line"; i; "="; Lines(i)

  With Reg1
      '.Pattern = "(\<\!ELEMENT\s)(\w*)(\s*\(\#\w*\)\s*\>)"
      .Pattern = "(\<\!ELEMENT\s)(\w*)(\s*\(\#\w*\)\s*\>)"

      .Global = True
      .MultiLine = True
      .IgnoreCase = False
  End With

  If Reg1.Test(Lines(i)) Then
    Set M1 = Reg1.Execute(Lines(i))
    For Each M In M1
      sExtract = M.SubMatches(1)
      sExtract = Replace(sExtract, Chr(13), "")
      Cells(J, 2) = sExtract
      J = J + 1
      'Debug.Print sExtract
    Next M
  End If
Next i

Set Reg1 = Nothing

End Sub

目前,我正在匹配一组这样的数据:

 <!ELEMENT DealNumber  (#PCDATA) >

并提取Dealnumber但是现在,我需要在这样的数据上添加另一个匹配:

<!ELEMENT DealParties  (DealParty+) >

并在没有Parens和+

的情况下提取Dealparty

我一直在使用它作为参考,它很棒,但我仍然有点困惑。 How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

修改

我遇到了一些必须匹配的新方案。

 Extract Deal
 <!ELEMENT Deal  (DealNumber,DealType,DealParties) >

 Extract DealParty the ?,CR are throwing me off
 <!ELEMENT DealParty  (PartyType,CustomerID,CustomerName,CentralCustomerID?,
           LiabilityPercent,AgentInd,FacilityNo?,PartyReferenceNo?,
           PartyAddlReferenceNo?,PartyEffectiveDate?,FeeRate?,ChargeType?) >

 Extract Deals
 <!ELEMENT Deals  (Deal*) >

2 个答案:

答案 0 :(得分:3)

查看您的模式,您有太多的捕获组。您只想捕获PCDATADealParty。尝试将模式更改为:

  With Reg1
      .Pattern = "\<!ELEMENT\s+\w+\s+\(\W*(\w+)\W*\)"

      .Global = True
      .MultiLine = True
      .IgnoreCase = False
  End With

这里是存根:Regex101

答案 1 :(得分:1)

您可以使用此Regex模式;

  .Pattern = "\<\!ELEMENT\s+(\w+)\s+\((#\w+|(\w+)\+)\)\s+\>"
  1. 此部分
  2. (#\w+|(\w+)\+)

    匹配

      

    #A-Z0-9
           一个-Z0-9 +

    在括号内。

    即匹配

      

    (#PCDATA)
      (DealParty +)

    验证整个字符串

    1. 然后,子匹配用于提取第一个有效匹配的 DealNumber DealParty 用于其他有效匹配
    2. 下面编辑过的代码 - 注意子匹配现在是M.submatches(0)

          Sub ImportFromDTD()
      
      Dim sDTDFile As Variant
      Dim ffile As Long
      Dim sLines() As String
      Dim i As Long
      Dim Reg1 As RegExp
      Dim M1 As MatchCollection
      Dim M As Match
      Dim myRange As Range
      
      Set Reg1 = New RegExp
      J = 1
      
      strIn = "<!ELEMENT Deal12Number  (#PCDATA) > <!ELEMENT DealParties  (DealParty+) >"
      
      With Reg1
            .Pattern = "\<\!ELEMENT\s+(\w+)\s+\((#\w+|(\w+)\+)\)\s+\>"
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
      End With
      
      If Reg1.Test(strIn) Then
          Set M1 = Reg1.Execute(strIn)
          For Each M In M1
            sExtract = M.SubMatches(2)
            If Len(sExtract) = 0 Then sExtract = M.SubMatches(0)
            sExtract = Replace(sExtract, Chr(13), "")
            Cells(J, 2) = sExtract
            J = J + 1
          Next M
      End If
      
      Set Reg1 = Nothing
      
      End Sub