在Python2.7

Question

我有一个文本文件，其行如下：

"aa aa bb aa"
"cc cc dd bb bb"

想要删除重复令牌以获得类似的文件：

"aa bb"
"cc dd bb"

Answer 1

在Python2.7

中

with open("datafile") as fin, open("outfile","w") as fout:
    for line in fin:
        print >> fout, ' '.join(set(line.split()))

在Python3.x中

with open("datafile") as fin, open("outfile","w") as fout:
    for line in fin:
        print(*(set(line.split()), file=fout)

Answer 2

在python中：

s = "aa aa bb aa"
' '.join(set(s.split()))

输出：

'aa bb'

如果订单很重要，请尝试：

lst = []
[lst.append(i) for i in s.split() if i not in lst]
' '.join(lst)

Answer 3

下面。虽然这有点复杂，但它会维持秩序。

>>> for e in s.split():
        c = set(e)
        for i in c:
            print(i)        
a
a
b
a

将其放入您的文件内上下文中：

with open('datafile') as fin, open('outfile') as fout:
    for e in s.split():
        c = set(e)
        for i in c:
            print(i, end=' ' outfile=fout)
                    #print >> fout, i #Python 2.x

Answer 4

这样的事情：

from sets import Set
lines = ['aa aa bb aa','cc cc dd bb bb']
for l in lines:
    s = Set()
    for word in l.split():
        s.add(word)
    print ' '.join(s)

使用单个字符串替换文件中的重复字符串

4 个答案:

在Python2.7

在Python3.x中