每一行都有这样的形式:
[id=52, idRegion=3857, tipo=New, CustomerDetails=[id=10, countryCode=DE, ... and so on
我想要完成的是逐行读取一个带有id,idRegion等值的元组,就像这样
(52,3857,New,10,DE ....), (another line with tuple).... to later to put in an excel
我已经尝试过了,但这似乎与我想要的太相符了:
a = re.findall( "id=(\d+),.idRegion=\d+, tipo=.*?,", file_txt)
b = re.findall( "id=\d+,.idRegion=(\d+),.tipo=.*?,", file_txt)
c = re.findall( "id=\d+,.idRegion=\d+,.tipo=(.*?),", file_txt)
d = [tuple(j for j in i if j)[-1] for i in a,b,c]
print c
答案 0 :(得分:0)
我们对您的输入数据格式知之甚少。假设您的密钥仅由字母数字字符组成,值由字母数字和空格组成,您可以使用\w+=([\w\s]+?)[,\]]
正则表达式来捕获值。通过re.findall()
对每一行应用表达式:
import re
data = """
[id=52, idRegion=3857, tipo=New, CustomerDetails=[id=10, countryCode=DE]
[id=100, idRegion=11, tipo=New Something, CustomerDetails=[id=20, countryCode=DE]
"""
pattern = re.compile(r"\w+=([\w\s]+?)[,\]]")
print([
tuple(pattern.findall(line)) for line in data.splitlines() if line
])
打印:
[
('52', '3857', 'New', '10', 'DE'),
('100', '11', 'New Something', '20', 'DE')
]