匹配方法与参考全局变量

时间:2013-05-28 18:19:30

标签: regex vbscript non-greedy

这个问题与this one密切相关,但它与抓取包含对全局变量的引用(未注释掉)的方法有关。

我正在使用以下正则表达式和测试字符串来检查它是否有效,但它只是部分工作:

正则表达式

^((?:(?:Public|Private)\s+)?(?:Function|Sub).+)[\s\S]+?(GLOBAL_VARIABLE_1)[\s\S]+?End\s+(?:Function|Sub)$

(我需要使用捕获组这样的正则表达式的一部分,以便我可以将方法的名称作为子匹配来获取。)

测试字符串

'-----------------------------------------------------------------------------------------
'
'   the code:   Header
'
'-----------------------------------------------------------------------------------------

Dim GLOBAL_VARIABLE_1
Dim GLOBAL_VARIABLE_2
Dim GLOBAL_VARIABLE_3

Public Function doThis(byVal xml)
'' Created               : dd/mm/yyyy
'' Return                : string
'' Param            : xml- an xml blob

     return = replace(xml, "><", ">" & vbLf & "<")

     GLOBAL_VARIABLE_1 = 2 + 2

     doThis = return

End Function


msgbox GLOBAL_VARIABLE_1



Public Function doThat(byVal xPath)
'' Created               : dd/mm/yyyy
'' Return                : array
' 'Param            : xPath

     return = split(mid(xPath, 2), "/")

     GLOBAL_VARIABLE_2 = 2 + 2


     doThat = return

End Function


GLOBAL_VARIABLE_2 = 2 + 2


Public Sub butDontDoThis()
'' Created               : dd/mm/yyyy
'' Return                : string
' 'Param            : obj

     For i = 0 To 5
          return = return & "bye" & " "

     Next

End Sub


GLOBAL_VARIABLE_3 = 3 + 3


Public Sub alsoDoThis(byRef obj)
'' Created               : dd/mm/yyyy
'' Return                : string
' 'Param            : obj, an xml document object

     For i = 0 To 4
          return = return & "hi" & " "

     Next

     GLOBAL_VARIABLE_1 = 1 + 1

End Sub


GLOBAL_VARIABLE_3 = 3 + 3

使用http://www.regexpal.com/,我能够突出显示引用全局变量的第一种方法。但是,正则表达式并没有像我期望的那样处理其他方法。正则表达式也会拾取其他没有引用特定全局变量的方法,并以实际使用全局变量的最后一个方法结束。我已经确定问题是[\s\S]+?(GLOBAL_VARIABLE_1)[\s\S]+?End\s+(?:Function|Sub)$部分正在进行最小/非贪婪的匹配,以便它一直保持直至找到实际匹配。

总之,表达式应遵循以下规则:

  • 当它看到方法声明的第一个结束时,停止扫描当前正在检查的方法。在此示例中,doThis只应匹配alsoDoThisGLOBAL_VARIABLE_1方法,但我不确定正则表达式应该是什么。
  • 正则表达式也应仅匹配实际使用全局变量的方法
  • 如果GLOBAL_VARIABLE_1被注释掉,那么该方法实际上没有使用它。注释GLOBAL_VARIABLE_1不应触发该方法的正匹配。

3 个答案:

答案 0 :(得分:1)

描述

我分两步完成,首先确定你的每个功能和潜艇。在这里,我使用引用\1来确保我们匹配正确的结束函数或结束子。此正则表达式还获取函数名称并将其放入组2.如果第2部分正确,则可以在以后使用

(?:Public|Private)\s+(Function|Sub)\s+([a-z0-9]*).*?End\s+\1 enter image description here

然后测试其中的每一个以查看它们是否包含您的变量,请注意在此测试中我使用多行匹配以确保注释字符不会出现在同一行的Global_Variable之前。这还会检查GLOBAL_VARIABLE_1之前是否有以下任何

  • 带或不带_分隔符的字母数字。这需要使用您在变量名称中找到的所有字符进行更新。在此处包含连字符-可能会与等式中使用的减号混淆。
  • 评论字符'

^[^']*?(?![a-z0-9][_]?|['])\bGLOBAL_VARIABLE_1

enter image description here

VB第1部分

Imports System.Text.RegularExpressions
Module Module1
  Sub Main()
    Dim sourcestring as String = "replace with your source string"
    Dim re As Regex = New Regex("(?:Public|Private)\s+(Function|Sub)\s+([a-z0-9]*).*?End\s+\1",RegexOptions.IgnoreCase OR RegexOptions.Singleline)
    Dim mc as MatchCollection = re.Matches(sourcestring)
    Dim mIdx as Integer = 0
    For each m as Match in mc
      For groupIdx As Integer = 0 To m.Groups.Count - 1
        Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
      Next
      mIdx=mIdx+1
    Next
  End Sub
End Module

$matches Array:
(
    [0] => Array
        (
            [0] => Public Function doThis(byVal xml)
'' Created               : dd/mm/yyyy
'' Return                : string
'' Param            : xml- an xml blob

     return = replace(xml, "><", ">" & vbLf & "<")

     GLOBAL_VARIABLE_1 = 2 + 2

     doThis = return

End Function
            [1] => Public Function doThat(byVal xPath)
'' Created               : dd/mm/yyyy
'' Return                : array
' 'Param            : xPath

     return = split(mid(xPath, 2), "/")

     GLOBAL_VARIABLE_2 = 2 + 2


     doThat = return

End Function
            [2] => Public Sub butDontDoThis()
'' Created               : dd/mm/yyyy
'' Return                : string
' 'Param            : obj

     For i = 0 To 5
          return = return & "bye" & " "

     Next

End Sub
            [3] => Public Sub alsoDoThis(byRef obj)
'' Created               : dd/mm/yyyy
'' Return                : string
' 'Param            : obj, an xml document object

     For i = 0 To 4
          return = return & "hi" & " "

     Next

     GLOBAL_VARIABLE_1 = 1 + 1

End Sub
        )

    [1] => Array
        (
            [0] => Function
            [1] => Function
            [2] => Sub
            [3] => Sub
        )

    [2] => Array
        (
            [0] => doThis
            [1] => doThat
            [2] => butDontDoThis
            [3] => alsoDoThis
        )

)

VB第2部分

在本文中找到

Public Function doThis(byVal xml)
'' Created               : dd/mm/yyyy
'' Return                : string
'' Param            : xml- an xml blob

     return = replace(xml, "><", ">" & vbLf & "<")

     GLOBAL_VARIABLE_1 = 2 + 2

     doThis = return

End Function

例如

Imports System.Text.RegularExpressions
Module Module1
  Sub Main()
    Dim sourcestring as String = "replace with your source string"
    Dim re As Regex = New Regex("^[^']*?GLOBAL_VARIABLE_1",RegexOptions.IgnoreCase OR RegexOptions.Multiline)
    Dim mc as MatchCollection = re.Matches(sourcestring)
    Dim mIdx as Integer = 0
    For each m as Match in mc
      For groupIdx As Integer = 0 To m.Groups.Count - 1
        Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
      Next
      mIdx=mIdx+1
    Next
  End Sub
End Module

$matches Array:
(
    [0] => Array
        (
            [0] =>  Param            : xml- an xml blob

     return = replace(xml, "><", ">" & vbLf & "<")

     GLOBAL_VARIABLE_1
        )

)

在本文中找不到

Public Function doThis(byVal xml)
'' Created               : dd/mm/yyyy
'' Return                : string
'' Param            : xml- an xml blob

     return = replace(xml, "><", ">" & vbLf & "<")

  '   GLOBAL_VARIABLE_1 = 2 + 2

     doThis = return

End Function

例如

Imports System.Text.RegularExpressions
Module Module1
  Sub Main()
    Dim sourcestring as String = "replace with your source string"
    Dim re As Regex = New Regex("^[^']*?GLOBAL_VARIABLE_1",RegexOptions.IgnoreCase OR RegexOptions.Multiline)
    Dim mc as MatchCollection = re.Matches(sourcestring)
    Dim mIdx as Integer = 0
    For each m as Match in mc
      For groupIdx As Integer = 0 To m.Groups.Count - 1
        Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
      Next
      mIdx=mIdx+1
    Next
  End Sub
End Module

Matches Found:
NO MATCHES.

在本文中也未找到

Public Sub butDontDoThis()
'' Created               : dd/mm/yyyy
'' Return                : string
' 'Param            : obj

     For i = 0 To 5
          return = return & "bye" & " "

     Next

End Sub

例如

   Imports System.Text.RegularExpressions
    Module Module1
      Sub Main()
        Dim sourcestring as String = "Public Sub butDontDoThis()
    '' Created               : dd/mm/yyyy
     '' Return                : string
     ' 'Param            : obj

     For i = 0 To 5
          return = return & ""bye"" & "" ""

     Next

End Sub"
        Dim re As Regex = New Regex("^[^']*?GLOBAL_VARIABLE_1",RegexOptions.IgnoreCase OR RegexOptions.Multiline)
        Dim mc as MatchCollection = re.Matches(sourcestring)
        Dim mIdx as Integer = 0
        For each m as Match in mc
          For groupIdx As Integer = 0 To m.Groups.Count - 1
            Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
          Next
          mIdx=mIdx+1
        Next
      End Sub
    End Module

    Matches Found:
    NO MATCHES.

声明

有很多边缘情况可以解决此问题,例如,如果您对' end function发表评论,或者如果您将字符串值分配给thisstring = "end sub"等变量

是的我知道OP用于VBscript,我已经包含了这些示例来演示整体逻辑和正则表达式的工作原理。

答案 1 :(得分:0)

找到了罪魁祸首。问题是由正则表达式的突出显示部分引起的:

((?:(?:Public|Private)\s+)?(?:Function|Sub).+)[\s\S]+?(GLOBAL_VARIABLE_1)[\s\S]+?End\s+(?:Function|Sub)

[\s\S]+?是非贪婪的比赛,但这并不一定意味着它是最短的比赛。简化示例:

Public Function doThis(byVal xml)
  GLOBAL_VARIABLE_1
End Function

Public Function doThat(byVal xPath)
  GLOBAL_VARIABLE_2
End Function

Public Sub butDontDoThis()
  GLOBAL_VARIABLE_3
End Sub

Public Sub alsoDoThis(byRef obj)
  GLOBAL_VARIABLE_1
End Sub

当正则表达式应用于示例文本时,它首先匹配第一个函数(标记为粗体文本的组):

Public Function doThis(byVal xml)
  GLOBAL_VARIABLE_1
End Function

然而,在该匹配之后,表达式的第一部分(((?:(?:Public|Private)\s+)?(?:Function|Sub).+))与下一个函数定义(Public Function doThat(byVal xPath))匹配,然后[\s\S]+?(GLOBAL_VARIABLE_1)匹配所有直到GLOBAL_VARIABLE_1下一次出现的文字:

Public Function doThat(byVal xPath)
  GLOBAL_VARIABLE_2
End Function

Public Sub butDontDoThis()
  GLOBAL_VARIABLE_3
End Sub

Public Sub alsoDoThis(byRef obj)
  GLOBAL_VARIABLE_1
End Sub

End Function中没有隐含的“不包含[\s\S]+?”。

对您的问题最简单的解决方案可能是正则表达式和字符串匹配的组合:

Set fso = CreateObject("Scripting.FileSystemObject")
text = fso.OpenTextFile("C:\Temp\sample.txt").ReadAll

Set re = New RegExp
re.Pattern = "((?:(?:Public|Private)\s+)(Function|Sub).+)([\s\S]+?)End\s+\2"
re.Global  = True
re.IgnoreCase = True

For Each m In re.Execute(text)
  If InStr(m.SubMatches(2), "GLOBAL_VARIABLE_1") > 0 Then
    WScript.Echo m.SubMatches(0)
  End If
Next

它提取每个过程/函数的主体(SubMatches(2)),然后检查InStr()是否正文包含GLOBAL_VARIABLE_1

答案 2 :(得分:0)

描述

此正则表达式将文本拆分为字符串,其中每个字符串包含单个函数或子函数。它还会通过查找函数内部第一行代码来验证字符串是否具有未注释的GLOBAL_VARIABLE_1 ,该代码中没有'前面的代码GLOBAL_VARIABLE_1值。'如果variable = "sometext ' more text" + GLOBAL_VARIABLE_1嵌入双引号字符串(?:Public|Private)\s+(Function|Sub)\s+([a-z0-9]*)(?:(?!^End\s+\1\s+(?:$|\Z)).)*^(?:[^'\r\n]|"[^"\r\n]*")*GLOBAL_VARIABLE_1.*?^End\s\1\b

,该表达式也会将function作为常规字符处理

sub

enter image description here

组0将包含整个匹配的函数/ sub

  1. 将相应地包含Public Function ValidEdgeCase1(byRef obj) SomeVariable = "some text with an embedded ' single quote" + GLOBAL_VARIABLE_1 End Sub Public Sub SkipEdgeCase(byRef obj) SomeVariable = "some text with an embedded ' single quote" ' + GLOBAL_VARIABLE_1 End Sub Public Function FailCommented(byVal xml) ' GLOBAL_VARIABLE_1 End Function Public Function FAilWrongName1(byVal xPath) GLOBAL_VARIABLE_2 End Function Public Sub FAilWrongName1() GLOBAL_VARIABLE_3 End Sub Public Sub alsoDoThis(byRef obj) GLOBAL_VARIABLE_1 End Sub Public Sub IHeartKitten(byRef obj) GLOBAL_VARIABLE_1 End Sub Public Sub IHeartKitten2(byRef obj) GLOBAL_VARIABLE_1 End Sub Public Function FailCommented(byVal xml) ' GLOBAL_VARIABLE_1 End Function Imports System.Text.RegularExpressions Module Module1 Sub Main() Dim sourcestring as String = "replace with your source string" Dim re As Regex = New Regex("(?:Public|Private)\s+(Function|Sub)\s+([a-z0-9]*)(?:(?!^End\s+\1\s+(?:$|\Z)).)*^(?:[^'\r\n]|"[^"\r\n]*")*GLOBAL_VARIABLE_1.*?^End\s\1\b",RegexOptions.IgnoreCase OR RegexOptions.IgnorePatternWhitespace OR RegexOptions.Multiline OR RegexOptions.Singleline) Dim mc as MatchCollection = re.Matches(sourcestring) Dim mIdx as Integer = 0 For each m as Match in mc For groupIdx As Integer = 0 To m.Groups.Count - 1 Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value) Next mIdx=mIdx+1 Next End Sub End Module
  2. 将包含函数/ sub
  3. 的名称

    实施例

    输入文字

    (
        [0] => Array
            (
                [0] => Public Function ValidEdgeCase1(byRef obj)
      SomeVariable = "some text with an embedded ' single quote" + GLOBAL_VARIABLE_1
    End Sub
                [1] => Public Sub alsoDoThis(byRef obj)
      GLOBAL_VARIABLE_1
    End Sub
                [2] => Public Sub IHeartKitten(byRef obj)
      GLOBAL_VARIABLE_1
    End Sub
                [3] => Public Sub IHeartKitten2(byRef obj)
      GLOBAL_VARIABLE_1
    End Sub
            )
    
        [1] => Array
            (
                [0] => Function
                [1] => Sub
                [2] => Sub
                [3] => Sub
            )
    
        [2] => Array
            (
                [0] => ValidEdgeCase1
                [1] => alsoDoThis
                [2] => IHeartKitten
                [3] => IHeartKitten2
            )
    
    )
    

    示例代码

    {{1}}

    $匹配数组:

    {{1}}
相关问题