我怎么能用Python Regex做到这一点?

时间:2010-05-08 23:19:47

标签: python regex

我正在尝试使用正则表达式正确提取由com接口为com接口生成的方法定义。此外,其中一些是空白的,这给我带来了更多的问题。

基本上我有这个:

IXMLSerializerAlt._methods_ = [
    COMMETHOD([helpstring(u'Loads an object from an XML string.')], HRESULT, 'LoadFromString',
              ( ['in'], BSTR, 'XML' ),
              ( ['in'], BSTR, 'TypeName' ),
              ( ['in'], BSTR, 'TypeNamespaceURI' ),
              ( ['retval', 'out'], POINTER(POINTER(IUnknown)), 'obj' )),
]

class EnvironmentManager(CoClass):
    u'Singleton object that manages different environments (collections of configuration information).'
    _reg_clsid_ = GUID('{8A626D49-5F5E-47D9-9463-0B802E9C4167}')
    _idlflags_ = []
    _typelib_path_ = typelib_path
    _reg_typelib_ = ('{5E1F7BC3-67C5-4AEE-8EC6-C4B73AAC42ED}', 1, 0)

INumberFormat._methods_ = [
]

我想提取IXMLSerializerAlt和INumberFormat方法定义但是我不能找出一个正确的正则表达式。例如。对于IXMLSerializer,我想提取这个:

IXMLSerializerAlt._methods_ = [
    COMMETHOD([helpstring(u'Loads an object from an XML string.')], HRESULT, 'LoadFromString',
              ( ['in'], BSTR, 'XML' ),
              ( ['in'], BSTR, 'TypeName' ),
              ( ['in'], BSTR, 'TypeNamespaceURI' ),
              ( ['retval', 'out'], POINTER(POINTER(IUnknown)), 'obj' )),
]

这个正则表达式在我看来应该有效:

^\w+\._methods_\s=\s\[$
(^.+$)*
^]$

我正在使用kodos检查我的正则表达式,但我无法找到一种方法来使这项工作。

2 个答案:

答案 0 :(得分:2)

您错过了$^之间的换行符,并且可能没有使用re.MULTILINE标志,该标志允许这些字符在行的开头和结尾处锚定。以下(使用re.MULTILINE编译)将匹配:

\w+\._methods_\s=\s\[$(?:\n^.+$)*\n^\]$

但是,这里有一个稍微简化的正则表达式,它也符合你的例子:

>>> s = '''...\nIXMLSerializerAlt._methods_ = [\n    COMMETHOD([helpstring(u'Loads an object from an XML string.')], HRESULT, 'LoadFromString',\n              ( ['in'], BSTR, 'XML' ),\n              ( ['in'], BSTR, 'TypeName' ),\n              ( ['in'], BSTR, 'TypeNamespaceURI' ),\n              ( ['retval', 'out'], POINTER(POINTER(IUnknown)), 'obj' )),\n]\n...'''
>>> import re
>>> re.findall(r'^\w+\._methods_\s=\s\[$.*?^\]$', s, re.DOTALL | re.MULTILINE)
["IXMLSerializerAlt._methods_ = [\n    COMMETHOD([helpstring(u'Loads an object from an XML string.')], HRESULT, 'LoadFromString',\n              ( ['in'], BSTR, 'XML' ),\n              ( ['in'], BSTR, 'TypeName' ),\n              ( ['in'], BSTR, 'TypeNamespaceURI' ),\n              ( ['retval', 'out'], POINTER(POINTER(IUnknown)), 'obj' )),\n]"]

答案 1 :(得分:0)

import re

interface_definitions = '''
IXMLSerializerAlt._methods_ = [
    COMMETHOD([helpstring(u'Loads an object from an XML string.')], HRESULT, 'LoadFromString',
              ( ['in'], BSTR, 'XML' ),
              ( ['in'], BSTR, 'TypeName' ),
              ( ['in'], BSTR, 'TypeNamespaceURI' ),
              ( ['retval', 'out'], POINTER(POINTER(IUnknown)), 'obj' )),
]

class EnvironmentManager(CoClass):
    u'Singleton object that manages different environments (collections of configuration information).'
    _reg_clsid_ = GUID('{8A626D49-5F5E-47D9-9463-0B802E9C4167}')
    _idlflags_ = []
    _typelib_path_ = typelib_path
    _reg_typelib_ = ('{5E1F7BC3-67C5-4AEE-8EC6-C4B73AAC42ED}', 1, 0)

INumberFormat._methods_ = [
]
'''

RX_METHODS = re.compile(
    r'(\w+)\._methods_\s=\s\[('
    r'.*?'
    r'(?:\[.*?\].*?)*'
    r')\]',
    re.DOTALL)

for match in RX_METHODS.finditer(interface_definitions):
    print match.groups()