Question

我需要使用正则表达式python从c代码中找到以下内容，但有些我无法正确编写。

if(condition)
     /*~T*/
     {
        /*~T*/
        _getmethis = FALSE;
     /*~T*/
     }
..........
/*~T*/
     _findmethis = FALSE;
......
                    /*~T*/
_findthat = True;

我需要在/ *〜T /之后找到所有变量，并以下划线开头，然后写入新文件，但是我的代码找不到它，我尝试了几种正则表达式模式，它始终是空的输出文件

import re
fh = open('filename.c', "r")
output = open("output.txt", "w")
pattern = re.compile(r'(\/\*~T\*\/)(\s*?\n\s*)(_[aA-zZ]*)')
for line in fh:
for m in re.finditer(pattern, line):
    output.write(m.group(3))
    output.write("\n")

output.close()

Answer 1

找不到任何内容的原因是您的模式跨越了多行，但是一次只看一行文件。

考虑使用此：

t = """
if(condition)
     /*~-*/
     {
        /*~T*/
        _getmethis = FALSE;
     /*~-*/
     }
..........
/*~T*/
     _findmethis = FALSE;

     /*~T*/
     do_not_findme_this = FALSE;
"""

import re
pattern = re.compile(r'/\*~T\*/.*?\n\s+(_[aA-zZ]*)', re.MULTILINE|re.DOTALL)
for m in re.finditer(pattern, t):  # use the whole file here - not line-wise
    print(m.group(1))

该模式使用2个标志来告诉正则表达式使用多行匹配，点.也与换行符（默认情况下不匹配）以及非贪婪的.*?匹配，以使{ {1}}，以下的组则最小。

打印输出：

/*~-T*/

Doku：

Answer 2

您需要使用fh.read()整体读取文件，并确保将模式修改为仅匹配字母，因为[aA-zZ]不仅仅匹配字母。

我建议的模式是

(/\*~T\*/)([^\S\n]*\n\s*)(_[a-zA-Z]*)

请参见regex demo。请注意，我故意从第一个\n中减去\s*，以使匹配更有效。

读入文件时，使用with更方便，这样您就不必使用.close()：

import re
pattern = re.compile(r'(/\*~T\*/)(\s*?\n\s*)(_[aA-zZ]*)')

with open('filename.c', "r") as fh:
    contents = fh.read()
    with open("output.txt", "w") as output:
        output.write("\n".join([x.group(3) for x in pattern.finditer(contents)]))

Answer 3

这是我的最终版本，在此我也尽量避免重复

import re
fh = open('filename.c', "r")
filecontent = fh.read() 
output = open("output.txt", "w")
createlist = []
pattern = re.compile(r"(/\*~T\*/)(\s*?\n\s*)(_[aA-zZ]*)")
for m in re.finditer(pattern, filecontent):
    if m.group(3) not in createlist:
        createlist.append(m.group(3))
        output.write(m.group(3))
        output.write('\n')
output.close()

查找带有以特定模式的下划线开头的单词的换行符

3 个答案: