我有一个列表列表
import numpy as np
mylist = np.array([[1, 9, 4,4, 5,4, 6, 13, 22, 3, 5, 2, 55, 6, 18],
[12, 44, 3, 6],
[22, 3, 12,44, 65, 8],
[144, 15, 13, 12, 1, 9, 44, 67, 187, 2, 6, 17, 66, 88, 99]])
我想在每个列表中找到前10位最大和前10位最大数字的索引。我该怎么做? numpy或pandas有功能吗?
top10_index =
top10_max =
非常感谢。
答案 0 :(得分:2)
可以用这种方式在纯python中完成
import heapq
mylist = [[1, 9, 4,4, 5,4, 6, 13, 22, 3, 5, 2, 55, 6, 18],
[12, 44, 3, 6],
[22, 3, 12,44, 65, 8],
[144, 15, 13, 12, 1, 9, 44, 67, 187, 2, 6, 17, 66, 88, 99]]
print([heapq.nlargest(10, L) for L in mylist])
# [[55, 22, 18, 13, 9, 6, 6, 5, 5, 4], [44, 12, 6, 3], [65, 44, 22, 12, 8, 3], [187, 144, 99, 88, 67, 66, 44, 17, 15, 13]]
print([heapq.nlargest(10, range(len(L)), key=L.__getitem__) for L in mylist])
# [[12, 8, 14, 7, 1, 6, 13, 4, 10, 2], [1, 0, 3, 2], [4, 3, 0, 2, 5, 1], [8, 0, 14, 13, 7, 12, 6, 11, 1, 2]]
答案 1 :(得分:1)
在熊猫中,您可以创建具有元组列表理解的DataFrame
:
L = [(i, y) for i, x in enumerate(mylist) for y in x]
df = pd.DataFrame(L, columns=['no','val'])
然后根据需要通过GroupBy.cumcount
创建原始列的顺序:
df['counter'] = df.groupby('no').cumcount()
然后排序:
df1 = df.sort_values(['no','val'], ascending=[True, False])
print (df1)
no val counter
12 0 55 12
8 0 22 8
14 0 18 14
7 0 13 7
1 0 9 1
6 0 6 6
13 0 6 13
4 0 5 4
10 0 5 10
2 0 4 2
3 0 4 3
5 0 4 5
9 0 3 9
11 0 2 11
0 0 1 0
16 1 44 1
15 1 12 0
18 1 6 3
17 1 3 2
23 2 65 4
22 2 44 3
19 2 22 0
21 2 12 2
24 2 8 5
20 2 3 1
33 3 187 8
25 3 144 0
39 3 99 14
38 3 88 13
32 3 67 7
37 3 66 12
31 3 44 6
36 3 17 11
26 3 15 1
27 3 13 2
28 3 12 3
30 3 9 5
35 3 6 10
34 3 2 9
29 3 1 4
最后使用Series.head
过滤创建最高值,并转换为GroupBy.agg
中的列表:
N = 3
top10_index = df1.groupby('no')['counter'].agg(lambda x: x.head(N).tolist()).tolist()
print (top10_index)
[[12, 8, 14], [1, 0, 3], [4, 3, 0], [8, 0, 14]]
top10_max = df1.groupby('no')['val'].agg(lambda x: x.head(N).tolist()).tolist()
print (top10_max)
[[55, 22, 18], [44, 12, 6], [65, 44, 22], [187, 144, 99]]
答案 2 :(得分:0)
该列表已从循环添加到DF,然后进行了计算。
import pandas as pd
data = pd.DataFrame(index=[], columns=['list_no', 'idx', 'value'])
for i,lst in enumerate(mylist):
for k, val in enumerate(lst):
tmp = pd.Series([i, k, val], index=data.columns, name=k)
data = data.append(tmp, ignore_index=True)
data.sort_values('value', ascending=False, inplace=True)
top10_max = data['value'].head(10)
top10_index = list(zip(data['list_no'], data['idx']))[0:10]