用于多行检查的正则表达式

时间:2011-01-11 15:23:40

标签: python regex

我正在尝试使用正则表达式(import re)从日志文件中提取我想要的信息。

更新:添加了C:\WINDOWS\security文件夹权限,破坏了所有示例代码。

说日志的格式是:

C:\:
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Folders
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Files
    \Everyone   Allowed:    Read & Execute
    (No auditing)

C:\WINDOWS\system32:
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Power Users Allowed:    Modify
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Delete
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    (No auditing)

C:\WINDOWS\system32\config:
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Power Users Allowed:    Read & Execute
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    (No auditing)

C:\WINDOWS\security:
    BUILTIN\Users   Allowed:    Special Permissions: 
            Traverse Folder
            Read Attributes
            Read Permissions
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Traverse Folder
            Read Attributes
            Read Permissions
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    (No auditing)

它重复了一些其他目录。如何将它们拆分为paragraphs,然后检查包含Special Permissions:的行?

像这样:

  1. 将整个string1分成几个部分C:\C:\WINDOWS\system32
  2. 查看包含“特殊权限:”
  3. 的每一行
  4. 显示整行,例如: C:\: BUILTIN\Users Allowed: Special Permissions: \n\ Create Folders\n\ BUILTIN\Users Allowed: Special Permissions: \n\ Create Files\n\
  5. 重复下一个'段落'
  6. 我在考虑: 1.在整个文本文件中搜索r"(\w+:\\)(\w+\\?)*:" - 返回路径 2.字符串函数或正则表达式以获得剩余的输出 3.删除除Special Permissions之外的所有其他行 4.显示,然后重复步骤1

    但我觉得效率不高。

    任何人都可以指导我吗?感谢。


    示例输出:

    C:\:
    BUILTIN\Users   Allowed:    Special Permissions:
    Create Folders
    BUILTIN\Users   Allowed:    Special Permissions:
    Create Files
    
    C:\WINDOWS\system32:
    BUILTIN\Power Users Allowed:    Special Permissions: 
    Delete
    
    C:\WINDOWS\security:
    BUILTIN\Users   Allowed:    Special Permissions: 
    Traverse Folder
    Read Attributes
    Read Permissions
    BUILTIN\Power Users Allowed:    Special Permissions: 
    Traverse Folder
    Read Attributes
    Read Permissions
    

    C:\WINDOWS\system32\config没有出现,因为行中没有特别许可。


    我正在使用的模板:

    import re
    
    text = ""
    
    def main():
        f = open('DirectoryPermissions.xls', 'r')
        global text
        for line in f:
            text = text + line
        f.close
        print text
    
    def regex():
        global text
        <insert code here>
    
    if __name__ == '__main__':
        main()
        regex()
    

5 个答案:

答案 0 :(得分:2)

# I would replace this with reading lines from a file,
# rather than splitting a big string containing the file.

section = None
inspecialperm = False
with open("testdata.txt") as w:
    for line in w:
        if not line.startswith("            "):
            inspecialperm = False

        if section is None:
            section = line

        elif len(line) == 0:
            section = None

        elif 'Special Permissions' in line:
            if section:
                print section
                section = ""
            inspecialperm = True
            print line,

        elif inspecialperm:
            print line,

答案 1 :(得分:1)

如果您通过“split&amp; strip”解析字符串,则根本不需要re模块,这样效率更高:

for paragraph in string1.split('\n\n'):
    path = paragraph.split('\n', 1)[0].strip().rstrip(':')
    paragraph = paragraph.replace(': \n', ': ') # hack to have permissions in same line
    for line in paragraph.split('\n'):
        if 'Special Permissions: ' in line:
            permission = line.rsplit(':', 1)[-1].strip()
            print 'Path "%s" has special permission "%s"' % (path, permission)

print语句替换为符合您需要的语句。

编辑:正如评论中指出的那样,之前的解决方案不适用于编辑过的问题中的新输入行,但是这里是如何修复它(比使用正则表达式更有效) ):

for paragraph in string1.split('\n\n'):
    path = paragraph.split('\n', 1)[0].strip().rstrip(':')
    owner = None
    for line in paragraph.split('\n'):
        if owner is not None and ':' not in line:
            permission = line.rsplit(':', 1)[-1].strip()
            print 'Owner "%s" has special permission "%s" on path "%s"' % (owner, permission, path)
        else:
            owner = line.split(' Allowed:', 1)[0].strip() if line.endswith('Special Permissions: ') else None

答案 2 :(得分:1)

与milkypostman的解决方案类似,但是您尝试将输出格式化为:

lines=string1.splitlines()
seperator = None
for index, line in enumerate(lines):
    if line == "":
        seperator = line
    elif "Special Permissions" in line:
        if seperator != None:
            print seperator
        print line.lstrip()
        offset=0
        while True:
            #if the line's last 2 characters are ": "
            if lines[index+offset][-2:]==": ":
                print lines[index+offset+1].lstrip()
                offset+=1
            else:
                break

答案 3 :(得分:0)

以下是使用re模块和findall方法的解决方案。

data = '''\
C:\:
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control 
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Folders
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Files
    \Everyone   Allowed:    Read & Execute
    (No auditing)

C:\WINDOWS\system32:
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Power Users Allowed:    Modify
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Delete
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    (No auditing)

C:\WINDOWS\system32\config:
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Power Users Allowed:    Read & Execute
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    (No auditing)
'''

if __name__ == '__main__':
    import re

    # A regular expression to match a section "C:...."
    cre_par = re.compile(r'''
                ^C:.*?
                ^\s*$''', re.DOTALL | re.MULTILINE | re.VERBOSE)

    # A regular expression to match a "Special Permissions" line, and the
    # following line.
    cre_permissions = re.compile(r'''(^.*Special\ Permissions:\s*\n.*)\n''', 
                                re.MULTILINE | re.VERBOSE)

    # Create list of strings to output.
    out = []
    for t in cre_par.findall(data):
        out += [t[:t.find('\n')]] + cre_permissions.findall(data) + ['']

    # Join output list of strings together using end-of-line character
    print '\n'.join(out)

以下是生成的输出:

C:\:
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Folders
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Files
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Delete

C:\WINDOWS\system32:
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Folders
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Files
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Delete

C:\WINDOWS\system32\config:
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Folders
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Files
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Delete

答案 4 :(得分:0)

感谢milkypostmanscoffey,其余的我想出了解决方案:

def regex():
    global text
    for paragraph in text.split('\n\n'):
        lines = paragraph.split('\n', 1)
        #personal modifier to choose certain output only
        if lines[0].startswith('C:\\:') or lines[0].startswith('C:\\WINDOWS\system32:') or lines[0].startswith('C:\\WINDOWS\\security:'):
            print lines[0]
            iterables = re.finditer(r".*Special Permissions: \n(\s+[a-zA-Z ]+\n)*", lines[1])
            for items in iterables:
                #cosmetic fix
                parsedText = re.sub(r"\n$", "", items.group(0))
                parsedText = re.sub(r"^\s+", "", parsedText)
                parsedText = re.sub(r"\n\s+", "\n", parsedText)
                print parsedText
            print

我仍然会查看所有发布的代码(特别是scoffey,因为我从来不知道纯粹的字符串操作是如此强大)。感谢您的见解!

当然,这不是最优的,但它适用于我的情况。如果您有任何建议,请随时发布。


输出:

C:\Python27>openfile.py
C:\:
BUILTIN\Users   Allowed:        Special Permissions:
Create Folders
BUILTIN\Users   Allowed:        Special Permissions:
Create Files

C:\WINDOWS\security:
BUILTIN\Users   Allowed:        Special Permissions:
Traverse Folder
Read Attributes
Read Permissions
BUILTIN\Power Users     Allowed:        Special Permissions:
Traverse Folder
Read Attributes
Read Permissions

C:\WINDOWS\system32:
BUILTIN\Power Users     Allowed:        Special Permissions:
Delete