Question

我如何编写一个正则表达式，删除以＃开头的所有注释并在行尾停止 - 但同时排除前两行说

#!/usr/bin/python

和

#-*- coding: utf-8 -*-

Answer 1

您可以通过使用tokenize.generate_tokens解析Python代码来删除注释。以下是this example from the docs的略微修改版本：

import tokenize
import io
import sys
if sys.version_info[0] == 3:
    StringIO = io.StringIO
else:
    StringIO = io.BytesIO

def nocomment(s):
    result = []
    g = tokenize.generate_tokens(StringIO(s).readline)  
    for toknum, tokval, _, _, _  in g:
        # print(toknum,tokval)
        if toknum != tokenize.COMMENT:
            result.append((toknum, tokval))
    return tokenize.untokenize(result)

with open('script.py','r') as f:
    content=f.read()

print(nocomment(content))

例如：

如果script.py包含

def foo(): # Remove this comment
    ''' But do not remove this #1 docstring 
    '''
    # Another comment
    pass

然后nocomment的输出是

def foo ():
    ''' But do not remove this #1 docstring 
    '''

    pass

Answer 2

我实际上并不认为这可以纯粹使用正则表达式来完成，因为您需要计算引号以确保#的实例不在字符串中。

我会调查python's built-in code parsing modules寻求这方面的帮助。

Answer 3

sed -e '1,2p' -e '/^\s*#/d' infile

然后将其包裹在subprocess.Popen电话中。

但是，这个不会替换真正的解析器！为什么这会引起兴趣？好吧，假设这个Python脚本：

output = """
This is
#1 of 100"""

Boom，任何非解析解决方案都会立即破坏您的脚本。

python正则表达式删除评论

3 个答案: