如何在python中逐行解析并将多个值放在元组中

时间:2017-05-02 13:42:51

标签: python parsing tuples

每一行都有这样的形式:

[id=52, idRegion=3857, tipo=New, CustomerDetails=[id=10, countryCode=DE, ... and so on

我想要完成的是逐行读取一个带有id,idRegion等值的元组,就像这样

(52,3857,New,10,DE ....), (another line with tuple).... to later to put in an excel 

我已经尝试过了,但这似乎与我想要的太相符了:

a = re.findall( "id=(\d+),.idRegion=\d+, tipo=.*?,", file_txt)
b = re.findall( "id=\d+,.idRegion=(\d+),.tipo=.*?,", file_txt)
c = re.findall( "id=\d+,.idRegion=\d+,.tipo=(.*?),", file_txt)
d = [tuple(j for j in i if j)[-1] for i in a,b,c]
print c

1 个答案:

答案 0 :(得分:0)

我们对您的输入数据格式知之甚少。假设您的密钥仅由字母数字字符组成,值由字母数字和空格组成,您可以使用\w+=([\w\s]+?)[,\]]正则表达式来捕获值。通过re.findall()对每一行应用表达式:

import re


data = """
[id=52, idRegion=3857, tipo=New, CustomerDetails=[id=10, countryCode=DE]
[id=100, idRegion=11, tipo=New Something, CustomerDetails=[id=20, countryCode=DE]
"""

pattern = re.compile(r"\w+=([\w\s]+?)[,\]]")

print([
    tuple(pattern.findall(line)) for line in data.splitlines() if line
])

打印:

[
    ('52', '3857', 'New', '10', 'DE'), 
    ('100', '11', 'New Something', '20', 'DE')
]