使用2个ID的组合识别重复的ID

时间:2017-12-13 10:19:09

标签: python duplicates

我有这样的数据集:

ID1 ID2
11  22
11  34
22  35
35  9
41  10
52  87
9   65
34  43

我想要一个输出数据集,它使用ID1和ID2分配检测重复ID:

ID1   ID2     ID3
11     22     ID_11
11     34     ID_11
22     35     ID_11
35     9      ID_11
41     10     ID_10
52     87     ID_87
9      65     ID_11
34     43     ID_1

由于ID 11,22,35,9,34都是彼此引用的,因此它们会映射到一个ID,即ID_11

1 个答案:

答案 0 :(得分:0)

你没有提供太多信息来干净利地写这个,但是这段代码应该在改变一些细节之后为你提供解决问题所需的python表达式。

# your id list, as a list of lists
vars = [
  [11, 22],
  [11, 34],
  [22, 35], 
  [35, 9],  
  [41, 10],
  [52, 87],
  [9, 65],
  [34, 43]
]

# create disjoint sets
groups = []
for id_1, id_2 in vars:
  for group in groups:
    if id_1 in group or id_2 in group:
      group.add(id_1)
      group.add(id_2)
      break
  else:
    groups.append({id_1, id_2})

# map the sets to some unique id/string/whatever
id_mappings = {}
for id_counter, group in enumerate(groups):
  id_mappings[id_counter] = group

# add the unique id/string/whatever to the initial list
for id_pair in vars:
  for group_id, group in id_mappings.items():
    if id_pair[0] in group:
      id_pair.append(group_id)

for var in vars:
  print(var)
>> [11, 22, 0]
>> [11, 34, 0]
>> [22, 35, 0]
>> [35, 9, 0]
>> [41, 10, 1]
>> [52, 87, 2]
>> [9, 65, 0]
>> [34, 43, 0]