从文本中删除重复的单词

时间:2012-12-13 15:55:36

标签: string duplicate-removal

我的文字包含以下字符串:

{whatever}:::duplicateString:::{whatever}
{whatever}:::duplicateString:::{whatever}
....
{whatever}:::duplicateString:::{whatever}
{whatever}:::duplicateString:::{whatever}

如何从文本中删除 duplicateString :主要想法是,如果它出现的次数超过一次,则从行中删除第二个单词。

第一个想法是逐行读取它们并按“ ::: ”拆分,以便创建数组并通过向TreeSet添加条目来迭代数组。好。但是如何再次粘合线?

我不记得任何机制来弄清楚这样的任务。语言没关系,只是解决方案?

示例文字:

Appliances:::Main
Appliances:::Main:::Appliance Warranties
Appliances:::Main:::Beer Keg Refrigerators
Appliances:::Main:::Beverage Refrigerators
Appliances:::Main:::Ceiling Fans & Accessories
Appliances:::Main:::Ceiling Fans & Accessories:::Accessories
Appliances:::Main:::Ceiling Fans & Accessories:::Accessories:::Downrod Couplers
Appliances:::Main:::Ceiling Fans & Accessories:::Accessories:::Downrods
Appliances:::Main:::Ceiling Fans & Accessories:::Accessories:::Fan Replacement Blades
理想情况下,它必须像:

Appliances:::Main
Appliances:::Appliance Warranties
Appliances:::Beer Keg Refrigerators
Appliances:::Beverage Refrigerators
Appliances:::Ceiling Fans & Accessories
Appliances:::Ceiling Fans & Accessories:::Accessories
Appliances:::Ceiling Fans & Accessories:::Accessories:::Downrod Couplers
Appliances:::Ceiling Fans & Accessories:::Accessories:::Downrods
Appliances:::Ceiling Fans & Accessories:::Accessories:::Fan Replacement Blades

1 个答案:

答案 0 :(得分:1)

如果duplicateString可能只作为第二个单词出现,你可以(在Python中):

lastWord = None
for line in open('file.txt'):
  w = line.split(':::')
  thisWord = w[1]
  if lastWord==w[1]:
    del w[1]
  lastWord = thisWord
  print ':::'.join(w)