转置列表列表的有效方法

时间:2017-12-01 14:35:50

标签: python list

我有一份清单清单:

x = [ [4, ‘c’, ‘b’, ‘d’], [2, ‘e’, ‘c’, ‘a’], [5, ‘a’, ‘c’] ]

我需要转换为:

x1 = [ [‘c’, 4, 2, 5], [‘b’, 4], [‘d’, 4], [‘e’, 2], [‘a’, 2, 5] ]

说明:

'c' appears in lists starting with 4, 2, 5
'b' appears in only the list starting with 4
'd' appears in only the list starting with 4
...

显然这是一个玩具示例,但我的真实列表在平面文件中大约有30 Mb。

我尝试使用两个嵌套的for循环但是我的MacBook Pro(8GB RAM)中只有5%的文件需要大约5个小时。

有没有一种有效的方法呢?

2 个答案:

答案 0 :(得分:3)

我还在两个嵌套循环中管理它:

from collections import defaultdict

x = [ [4, 'c', 'b', 'd'], [2, 'e', 'c', 'a'], [5, 'a', 'c'] ]

d = defaultdict(list)

for group in x:
    key = group[0]
    for item in group[1:]:
        d[item].append(key)


print(d)

# and to convert back to list:
x1 = [[key]+value for (key,value) in d.items()]
print(x1)

输出:

defaultdict(<class 'list'>, {'c': [4, 2, 5], 'b': [4], 'd': [4], 'e': [2], 'a': [2, 5]})
[['c', 4, 2, 5], ['b', 4], ['d', 4], ['e', 2], ['a', 2, 5]]

关于效率的说明:

在外环的内部,我计算group[1:]。现在,如果group很大,那么即使只是复制列表也可能很昂贵。如果是这样,循环可能会更好:

for group in x:
    it = iter(group)
    key = next(it)
    for item in it:
        d[item].append(key)

效率是O(n),其中n是所有列表中的项目总数。无论是这种处理,还是读取30MB的文件内容都是最慢的,我都无法衡量。

答案 1 :(得分:1)

基于@ quamrana对你实际想要完成的事情的假设:

x = [ [4, 'c', 'b', 'd'], 
      [2, 'e', 'c', 'a'], 
      [5, 'a', 'c'] ]

letters = {i for y in x for i in y if isinstance(i, str)}
y = [[i] + [sub[0] for sub in x if i in sub] for i in letters]
print(y)  # [['e', 2], ['d', 4], ['a', 2, 5], ['b', 4], ['c', 4, 2, 5]]