Question

我有一个文件夹名称列表为1d数组：即：

folderList=['A1_001', 'A1_002', 'A1_003', 'A1_004', 
            'A2_001', 'A2_002', 'A2_003', 'A2_004',
            'A3_001', 'A3_002', 'A3_003', 'A3_004']

并希望按前两个字符对列表进行分组，如“A1”，“A2”和“A3”。我认为这应该通过groupby完成，但我的代码不起作用

sectionName=[] #to get the first two characters of each element into a new list

for file in folderList:
    sectionName.append(file.split('_')[0])

for key, group in groupby(folderList,sectionName): 
    print key
    for record in group:
        print record

我收到了一个错误：

for key, group in groupby(folderList,sectionName):
TypeError: 'list' object is not callable

我想得到的是这样的结果：

A1
['A1_001', 'A1_002', 'A1_003', 'A1_004']

A2
['A2_001', 'A2_002', 'A2_003', 'A2_004']

A3
['A3_001', 'A3_002', 'A3_003', 'A3_004']

我认为groupby函数需要第二个输入作为关键函数，但到目前为止未能将sectionName实现为keyfunction。如果你能提供帮助，请提前致谢。

Answer 1

In [40]: folderList=['A1_001', 'A1_002', 'A1_003', 'A1_004','A2_001', 'A2_002', 'A2_003', 'A2_004','A3_001', 'A3_002', 'A3_003', 'A3_004','B1_001','B1_002','B1_003','B2_001','B2_002','B2_003']

In [41]: for k, v in groupby(folderList, lambda x:x[:2]):
    ...:     print k, [x for x in v]
    ...:     
A1 ['A1_001', 'A1_002', 'A1_003', 'A1_004']
A2 ['A2_001', 'A2_002', 'A2_003', 'A2_004']
A3 ['A3_001', 'A3_002', 'A3_003', 'A3_004']
B1 ['B1_001', 'B1_002', 'B1_003']
B2 ['B2_001', 'B2_002', 'B2_003']

或以简单的方式：

In [42]: result={}

In [43]: for v in folderList:
    ...:     result.setdefault(v[:2],[]).append(v)
    ...:     

In [44]: result
Out[44]: 
{'A1': ['A1_001', 'A1_002', 'A1_003', 'A1_004'],
 'A2': ['A2_001', 'A2_002', 'A2_003', 'A2_004'],
 'A3': ['A3_001', 'A3_002', 'A3_003', 'A3_004'],
 'B1': ['B1_001', 'B1_002', 'B1_003'],
 'B2': ['B2_001', 'B2_002', 'B2_003']}

Answer 2

例如：

grouped = {prefix: list(folders) for 
    prefix, folders in itertools.groupby(folderList, lambda x: x[:2])}

替代方法，不需要对folderList进行排序：

from collections import defaultdict
grouped = defaultdict(list)
for folder in folderList:
    grouped[folder[:2]].append(folder)

Answer 3

一个简单的循环和defaultdict将执行：

from collections import defaultdict

folderList=['A1_001', 'A1_002', 'A1_003', 'A1_004', 
            'A2_001', 'A2_002', 'A2_003', 'A2_004',
            'A3_001', 'A3_002', 'A3_003', 'A3_004']

sections = defaultdict(lambda: [])
for folder in folderList:
    sections[folder[:2]].append(folder)
print sections.values()

打印：

[['A1_001', 'A1_002', 'A1_003', 'A1_004'], ['A3_001', 'A3_002', 'A3_003', 'A3_004'], ['A2_001', 'A2_002', 'A2_003', 'A2_004']]

groupby的缺点是必须对输入进行排序，并输出迭代器。在你的情况下，听起来你想要列表，所以你需要采取list的额外步骤来判断它们。上面的循环是实现你想要的简单方法。

Answer 4

folderList.sort()
def sectionName(sec):
    return sec.split('_', 1)[0]
for key, lst in groupby(folderList, sectionName):
     print key
     for record in lst:
         print record

如何根据元素名称的一部分对一维列表进行排序？

4 个答案: