Question

我有一个列表列表

import numpy as np
mylist = np.array([[1, 9, 4,4, 5,4, 6, 13, 22, 3, 5, 2, 55, 6, 18],
                   [12, 44, 3, 6],
                   [22, 3, 12,44, 65, 8],
                   [144, 15, 13, 12, 1, 9, 44, 67, 187, 2, 6, 17, 66, 88, 99]])

我想在每个列表中找到前10位最大和前10位最大数字的索引。我该怎么做？ numpy或pandas有功能吗？

top10_index = 
top10_max =

非常感谢。

Answer 1

可以用这种方式在纯python中完成

import heapq

mylist = [[1, 9, 4,4, 5,4, 6, 13, 22, 3, 5, 2, 55, 6, 18],
                   [12, 44, 3, 6],
                   [22, 3, 12,44, 65, 8],
                   [144, 15, 13, 12, 1, 9, 44, 67, 187, 2, 6, 17, 66, 88, 99]]
print([heapq.nlargest(10, L) for L in mylist])
# [[55, 22, 18, 13, 9, 6, 6, 5, 5, 4], [44, 12, 6, 3], [65, 44, 22, 12, 8, 3], [187, 144, 99, 88, 67, 66, 44, 17, 15, 13]]
print([heapq.nlargest(10, range(len(L)), key=L.__getitem__) for L in mylist])
# [[12, 8, 14, 7, 1, 6, 13, 4, 10, 2], [1, 0, 3, 2], [4, 3, 0, 2, 5, 1], [8, 0, 14, 13, 7, 12, 6, 11, 1, 2]]

Answer 2

在熊猫中，您可以创建具有元组列表理解的DataFrame：

L = [(i, y) for i, x in enumerate(mylist) for y in x]
df = pd.DataFrame(L, columns=['no','val'])

然后根据需要通过GroupBy.cumcount创建原始列的顺序：

df['counter'] = df.groupby('no').cumcount()

然后排序：

df1 = df.sort_values(['no','val'], ascending=[True, False])
print (df1)
    no  val  counter
12   0   55       12
8    0   22        8
14   0   18       14
7    0   13        7
1    0    9        1
6    0    6        6
13   0    6       13
4    0    5        4
10   0    5       10
2    0    4        2
3    0    4        3
5    0    4        5
9    0    3        9
11   0    2       11
0    0    1        0
16   1   44        1
15   1   12        0
18   1    6        3
17   1    3        2
23   2   65        4
22   2   44        3
19   2   22        0
21   2   12        2
24   2    8        5
20   2    3        1
33   3  187        8
25   3  144        0
39   3   99       14
38   3   88       13
32   3   67        7
37   3   66       12
31   3   44        6
36   3   17       11
26   3   15        1
27   3   13        2
28   3   12        3
30   3    9        5
35   3    6       10
34   3    2        9
29   3    1        4

最后使用Series.head过滤创建最高值，并转换为GroupBy.agg中的列表：

N = 3
top10_index = df1.groupby('no')['counter'].agg(lambda x: x.head(N).tolist()).tolist()
print (top10_index)
[[12, 8, 14], [1, 0, 3], [4, 3, 0], [8, 0, 14]]

top10_max = df1.groupby('no')['val'].agg(lambda x: x.head(N).tolist()).tolist()
print (top10_max)
[[55, 22, 18], [44, 12, 6], [65, 44, 22], [187, 144, 99]]

Answer 3

该列表已从循环添加到DF，然后进行了计算。

import pandas as pd
data = pd.DataFrame(index=[], columns=['list_no', 'idx', 'value'])
for i,lst in enumerate(mylist):
    for k, val in enumerate(lst):
        tmp = pd.Series([i, k, val], index=data.columns, name=k)
        data = data.append(tmp, ignore_index=True)

data.sort_values('value', ascending=False, inplace=True)
top10_max = data['value'].head(10)
top10_index = list(zip(data['list_no'], data['idx']))[0:10]

Python-在具有不同长度的列表中找到N个最大数

3 个答案: