Question

对于经验丰富的numpy用户，我认为这是一个简单的问题。

我有一个分数矩阵。原始索引对应于样本，列索引对应于项目。例如，

null

我想获得每个样本的前M个项目索引。另外我想获得前M个分数。例如，

score_matrix = 
  [[ 1. ,  0.3,  0.4],
   [ 0.2,  0.6,  0.8],
   [ 0.1,  0.3,  0.5]]

使用numpy执行此操作的最佳方法是什么？

Answer 1

以下是使用np.argpartition -

的方法

idx = np.argpartition(a,range(M))[:,:-M-1:-1] # topM_ind
out = a[np.arange(a.shape[0])[:,None],idx]    # topM_score

示例运行 -

In [343]: a
Out[343]: 
array([[ 1. ,  0.3,  0.4],
       [ 0.2,  0.6,  0.8],
       [ 0.1,  0.3,  0.5]])

In [344]: M = 2

In [345]: idx = np.argpartition(a,range(M))[:,:-M-1:-1]

In [346]: idx
Out[346]: 
array([[0, 2],
       [2, 1],
       [2, 1]])

In [347]: a[np.arange(a.shape[0])[:,None],idx]
Out[347]: 
array([[ 1. ,  0.4],
       [ 0.8,  0.6],
       [ 0.5,  0.3]])

或者，速度可能更慢，但获得idx的代码会更短{ - 1}} -

np.argsort

这是一个post，其中包含一些运行时测试，可以针对类似问题对idx = a.argsort(1)[:,:-M-1:-1]和np.argsort进行比较。

Answer 2

我使用argsort()：

top2_ind = score_matrix.argsort()[:,::-1][:,:2]

即，生成一个包含将score_matrix排序的索引的数组：

array([[1, 2, 0],
       [0, 1, 2],
       [0, 1, 2]])

然后使用::-1反转列，然后使用:2取前两列：

array([[0, 2],
       [2, 1],
       [2, 1]])

然后类似但使用常规np.sort()来获取值：

top2_score = np.sort(score_matrix)[:,::-1][:,:2]

遵循与上述相同的机制，为您提供：

array([[ 1. ,  0.4],
       [ 0.8,  0.6],
       [ 0.5,  0.3]])

Answer 3

如果有人对值和相应的索引都感兴趣而又不按顺序调整，则以下简单方法将很有帮助。尽管处理大数据可能会在计算上昂贵，因为我们使用的是 list 来存储值，索引的 tuples 。

import numpy as np
values = np.array([0.01,0.6, 0.4, 0.0, 0.1,0.7, 0.12]) # a simple array
values_indices = [] # define an empty list to store values and indices
while values.shape[0]>1:
    values_indices.append((values.max(), values.argmax()))
    # remove the maximum value from the array:
    values = np.delete(values, values.argmax())

最终输出为元组列表：

values_indices
[(0.7, 5), (0.6, 1), (0.4, 1), (0.12, 3), (0.1, 2), (0.01, 0)]

沿NumPy数组中的轴获取N个最大值和索引

3 个答案: