Python:正则表达式匹配括号内的任何内容(也包括其他括号)

时间:2017-06-19 15:19:22

标签: python regex

我正在使用python和regex,我正在尝试转换字符串,如下所示:

(1694439,805577453641105408,'\"@Bessemerband not reverse gear  simply pointing out that I didn\'t say what you claim I said. I will absolutely riot if (Brexit) is blocked.\"',2887640,NULL,NULL,NULL),(1649240,805577446758158336,'\"Ugh FFS the people you use to look up to fail to use critical thinking. Smh. He did the same thing with brexit :( \"',2911510,NULL,NULL,NULL),

进入如下列表:

[
    [1694439, 805577453641105408, '\"@Bessemerband not reverse gear  simply pointing out that I didn\'t say what you claim I said. I will absolutely riot if (Brexit) is blocked.\"', 2887640, NULL, NULL, NULL],
    [1649240, 805577446758158336, '\"Ugh FFS the people you use to look up to fail to use critical thinking. Smh. He did the same thing with brexit :(\"', 2911510, NULL, NULL, NULL]
]

这里的主要问题是,正如您所看到的,文本中还有一些括号,我不想分开。 我已经尝试了\([^)]+\)之类的内容,但很明显,这会在它找到的第一个)处分裂。

任何线索如何解决这个问题?

3 个答案:

答案 0 :(得分:0)

这是您正在寻找的输出吗?

big = """(1694439,805577453641105408,'\"@Bessemerband not reverse gear  simply pointing out that I didn\'t say what you claim I said. I will absolutely riot if (Brexit) is blocked.\"',2887640,NULL,NULL,NULL),(1649240,805577446758158336,'\"Ugh FFS the people you use to look up to fail to use critical thinking. Smh. He did the same thing with brexit :( \"',2911510,NULL,NULL,NULL),"""
small = big.split('),')
print(small)

我正在做的是分裂),然后只是循环并在正常情况下分割逗号。我将展示一个可以优化的基本方法:

new_list = []

for x in small:
    new_list.append(x.split(','))
print(new_list)

现在的缺点是,有一个空列表,但你可以稍后放弃它。

答案 1 :(得分:0)

这是一个简单的正则表达式解决方案,可以捕获不同组中每个逗号分隔的值:

\(([^,]*),([^,]*),'((?:\\.|[^'])*)',([^,]*),([^,]*),([^,]*),([^)]*)

用法:

input_string = r"""(1694439,805577453641105408,'\"@Bessemerband not reverse gear  simply pointing out that I didn\'t say what you claim I said. I will absolutely riot if (Brexit) is blocked.\"',2887640,NULL,NULL,NULL),(1649240,805577446758158336,'\"Ugh FFS the people you use to look up to fail to use critical thinking. Smh. He did the same thing with brexit :( \"',2911510,NULL,NULL,NULL),"""

import re
result = re.findall(r"\(([^,]*),([^,]*),'((?:\\.|[^'])*)',([^,]*),([^,]*),([^,]*),([^)]*)", input_string)

答案 2 :(得分:0)

嵌套括号在这里不是问题,因为它们被引号括起来。您所要做的就是分别匹配引用的部分:

import re

pat = re.compile(r"[^()',]+|'[^'\\]*(?:\\.[^'\\]*)*'|(\()|(\))", re.DOTALL)

s = r'''(1694439,805577453641105408,'\"@Bessemerband not reverse gear  simply pointing out that I didn\'t say what you claim I said. I will absolutely riot if (Brexit) is blocked.\"',2887640,NULL,NULL,NULL),(1649240,805577446758158336,'\"Ugh FFS the people you use to look up to fail to use critical thinking. Smh. He did the same thing with brexit :( \"',2911510,NULL,NULL,NULL),'''

result = []

for m in pat.finditer(s):
    if m.group(1):
        tmplst = []
    elif m.group(2):
        result.append(tmplst)        
    else:
        tmplst.append(m.group(0))

print(result)

如果您的字符串也可以包含引号之间没有括号的括号,则可以使用regex module 的递归模式解决问题(使用它并且csv模块是个好主意)或建立状态机。