文本文件中的Python重复单词

时间:2015-07-06 04:13:08

标签: python list duplicates

我有一个txt文件

文本文件如下所示:

仍然无法正常工作。

  

[ '无'] [ '维加〜'] [ '维加〜'] [ '维加〜'] [ '8 ^) - > - <'] [ '暴力'] [ 'puker'] [ 'Zanaz'] [ '芬克'] [ '8 ^) - > - <'] [ '8 ^) - > - <'] [ '8 ^) - > - <'] [”维加〜 '] [' 暴力 '] [' Zanaz '] [' 芬克 '] [' puker '] [' 维加〜 '] [' 维加〜 '] [' 维加〜 '] [' 8 ^) - > - < '] [' 暴力 '] [' puker '] [' Zanaz '] [' 无 '] [' 草坪 '] [' 草坪 '] [' 草坪 '] [' 叶菜 '] [' Judge69' ] [ '大卫'] [ 'lilwade'] [ '可惜。'] [ 'artofwar'] [ 'Hazecloud'] [ '草坪'] [ '草坪'] [ '草坪'] [ 'Judge69'] [ '叶菜' ] [ '大卫'] [ 'lilwade'] [ 'Hazecloud'] [ '草坪'] [ '草坪'] [ '草坪'] [ '叶菜'] [ '大卫'] [ '可惜。'] ['lilwade '] [' artofwar '] [' Judge69' ]

我需要删除所有重复项,以便每个名称只显示一次,同时它必须保持它们所在的顺序。

   fo = open('C:\Python26\myfile.txt','r')
   name_cache = fo.readlines()
   typea = name_cache[0]

   def unique_list(l):
      ulist = []
      [ulist.append(x) for x in l if x not in ulist]
      return ulist

   mast =' '.join(unique_list(typea.split()))
   print mast

4 个答案:

答案 0 :(得分:3)

首先删除前导[并尾随]。然后在split][。例如

>>> x="['None']['Vega~']['Vega~']"
>>> x.rstrip(']').lstrip('[').split('][') 
["'None'", "'Vega~'", "'Vega~'"]

然后拨打unique_list

>>> y = x.rstrip(']').lstrip('[').split('][') 
>>> unique_list(y)
["'None'", "'Vega~'"]

然后您可以轻松地将其格式化为您想要的任何内容(即字符串)。

请注意,rstriplstrip均为O(n)。所以做x[1:-1]可能会更好。这假设您100%确定输入是给定形式的(以[开头,以]结尾)

这与散列每个单词(添加到python集合)具有相同的O(n)时间复杂度,但保持原始顺序,并且可以使用(非常整洁的)unique_list函数。

答案 1 :(得分:0)

你可以这样做:

import collections
def unique_list(l): return list(OrderedSet(l))

此外,typea只是一个没有空格的字符串。要拆分名称,请执行以下操作:

typea = typea.replace('[', '').split(']') # typea is now a list

答案 2 :(得分:0)

s = "['None']['Vega~']['Vega~']['Vega~']['8^)->-<']['violence']['puker']['Zanaz']['Funk']['8^)->-<']['8^)->-<']['8^)->-<']['Vega~']['violence']['Zanaz']['Funk']['puker']['Vega~']['Vega~']['Vega~']['8^)->-<']['violence']['puker']['Zanaz']['None']['Lawn']['Lawn']['Lawn']['Leafy']['Judge69']['David']['lilwade']['Pity.']['artofwar']['Hazecloud']['Lawn']['Lawn']['Lawn']['Judge69']['Leafy']['David']['lilwade']['Hazecloud']['Lawn']['Lawn']['Lawn']['Leafy']['David']['Pity.']['lilwade']['artofwar']['Judge69']"
ss = s[1:-1]
l = []
for i in ss.split(']['):
    if i not in l:
        l.append(i)
r = ' '.join(l)

结果:

"'None' 'Vega~' '8^)->-<' 'violence' 'puker' 'Zanaz' 'Funk' 'Lawn' 'Leafy' 'Judge69' 'David' 'lilwade' 'Pity.' 'artofwar' 'Hazecloud'"

答案 3 :(得分:0)

将括号括在名称周围的解决方案:

fo = open('myfile.txt','r')
name_cache = fo.readlines()[0]
names = []
for name in name_cache.replace('][', '],[').split(','):
    if name not in names:
        names.append(name)

print(names)