我正在尝试编写代码以将列表中的每个字符串相互比较,然后为相似性生成其正则表达式
list = ["LONDON-UK-L16-N1",
"LONDON-UK-L17-N1",
"LONDON-UK-L16-N2",
"LONDON-UK-L17-N2",
"PARIS-France-L16-N2"]
我正在尝试获得如下输出
LONDON-UK-L(16|17)-N(1|2)
有可能吗?谢谢
更新:为了清楚起见,我正在尝试 输入:列表或字符串 行动:将列表项彼此比较,并检查相似性(以修复它的第一组字符串),然后对任何其他不相似的项使用正则表达式,因此我们可以只用一个输出,而不用包含项(使用正则表达式) 输出:regex匹配不相似
输入: tez15-3-s1-y2 tez15-3-s2-y2 bro40-55-s1-y2
输出: tez15-3-s(1 | 2)-y2 ,bro40-55-s1-y2
答案 0 :(得分:3)
从您的问题中尚不清楚确切的问题是什么。由于您提供的数据是一致且井井有条的,因此只需拆分列表中的项目并将其分类,就可以轻松解决此问题。
loc_list = ["LONDON-UK-L16-N1", "LONDON-UK-L17-N1", "LONDON-UK-L16-N2",
"LONDON-UK-L16-N2", "PARIS-France-L16-N2"]
split_loc_list = [location.split("-") for location in loc_list]
locs = {}
for loc in split_loc_list:
locs.setdefault("-".join(loc[0:2]), {}).\
setdefault("L", set()).add(loc[2].strip("L"))
locs.setdefault("-".join(loc[0:2]), {}).\
setdefault("N", set()).add(loc[3].strip("N"))
for loc, vals in locs.items():
L_vals_sorted = sorted(list(map(int,vals["L"])))
L_vals_joined = "|".join(map(str,L_vals_sorted))
N_vals_sorted = sorted(list(map(int,vals["N"])))
N_vals_joined = "|".join(map(str,N_vals_sorted))
print(f"{loc}-L({L_vals_joined})-N({N_vals_joined})")
将输出:
LONDON-UK-L(16|17)-N(1|2)
PARIS-France-L(16)-N(2)
因为这里只有两个标签(“ L”和“ N”),所以我只是将它们写到了代码中。如果可能有很多标签,则可以使用:
import re
split = re.findall('\d+|\D+', loc[2])
key, val = split[0], split[1]
locs.setdefault("-".join(loc[0:2]), {}).\
setdefault(key, set()).add(val)
然后遍历所有标签,而不仅仅是在第二个循环中获取“ L”和“ N”。
答案 1 :(得分:1)
我已经实现了以下解决方案:
import re
data = [
'LONDON-UK-L16-N1',
'LONDON-UK-L17-N1',
'LONDON-UK-L16-N2',
'LONDON-UK-L16-N2',
'PARIS-France-L16-N2'
]
def deconstruct(data):
data = [y for y in [x.split('-') for x in data]]
result = dict()
for x in data:
pointer = result
for y in x:
substr = re.findall('(\D+)', y)
if substr:
substr = substr[0]
if not substr in pointer:
pointer[substr] = {0: set()}
pointer = pointer[substr]
substr = re.findall('(\d+)', y)
if substr:
substr = substr[0]
pointer[0].add(substr)
return result
def construct(data, level=0):
result = []
for key in data.keys():
if key != 0:
if len(data[key][0]) == 1:
nums = list(data[key][0])[0]
elif len(data[key][0]) > 1:
nums = '(' + '|'.join(sorted(list(data[key][0]))) + ')'
else:
nums = ''
deeper_result = construct(data[key], level + 1)
if not deeper_result:
result.append([key + nums])
else:
for d in deeper_result:
result.append([key + nums] + d)
return result if level > 0 else ['-'.join(x) for x in result]
print(construct(deconstruct(data)))
# ['LONDON-UK-L(16|17)-N(1|2)', 'PARIS-France-L16-N2']
答案 2 :(得分:1)
我发布了有关此问题的新的(第二个)实现,我认为更准确,希望对您有所帮助:
import re
data = [
'LONDON-UK-L16-N1',
'LONDON-UK-L17-N1',
'LONDON-UK-L16-N2',
'LONDON-UK-L17-N2',
'LONDON-UK-L18-N2',
'PARIS-France-L16-N2',
]
def merge(data):
data.sort()
data = [y for y in [x.split('-') for x in data]]
for col in range(len(data[0]) - 1, -1, -1):
result = []
def add_result():
result.append([])
if headstr:
result[-1] += headstr.split('-')
if len(list(findnum)) > 1:
result[-1] += [f'{findstr}({"|".join(sorted(findnum))})']
elif len(list(findnum)) == 1:
result[-1] += [f'{findstr}{findnum[0]}']
if tailstr:
result[-1] += tailstr.split('-')
_headstr = lambda x, y: '-'.join(x[:y])
_tailstr = lambda x, y: '-'.join(x[y + 1:])
_findstr = lambda x: re.findall('(\D+)', x)[0] if re.findall('(\D+)', x) else ''
_findnum = lambda x: re.findall('(\d+)', x)[0] if re.findall('(\d+)', x) else ''
headstr = _headstr(data[0], col)
tailstr = _tailstr(data[0], col)
findstr = _findstr(data[0][col])
findnum = []
for row in data:
if headstr + findstr + tailstr != _headstr(row, col) + _findstr(row[col]) + _tailstr(row, col):
add_result()
headstr = _headstr(row, col)
tailstr = _tailstr(row, col)
findstr = _findstr(row[col])
findnum = []
if _findnum(row[col]) not in findnum:
findnum.append(_findnum(row[col]))
else:
add_result()
data = result[:]
return ['-'.join(x) for x in result]
print(merge(data)) # ['LONDON-UK-L(16|17)-N(1|2)', 'LONDON-UK-L18-N2', 'PARIS-France-L16-N2']
答案 3 :(得分:0)
请勿使用“列表”作为变量名...这是保留字。
import re
lst = ['LONDON-UK-L16-N1', 'LONDON-UK-L17-N1', 'LONDON-UK-L16-N2', 'LONDON-UK-L16-N2', 'PARIS-France-L16-N2']
def check_it(string):
return re.search(r'[a-zA-Z\-]*L(\d)*-N(\d)*', string)
[check_it(x).group(0) for x in lst]
将输出:
['LONDON-UK-L16-N1',
'LONDON-UK-L17-N1',
'LONDON-UK-L16-N2',
'LONDON-UK-L16-N2',
'PARIS-France-L16-N2']
从那里开始,查看组并定义一个组以覆盖要用于相似性的部分。