查找矩阵内最接近/相似的值(向量)

时间:2018-09-17 09:06:35

标签: python numpy

假设我有以下numpy矩阵(简化):

matrix = np.array([[1, 1],
               [2, 2],
               [5, 5],
               [6, 6]]
              )

现在我想从最接近“搜索”向量的矩阵中获取向量:

search_vec = np.array([3, 3])

我要做的是以下事情:

min_dist = None
result_vec = None
for ref_vec in matrix:
    distance = np.linalg.norm(search_vec-ref_vec)
    distance = abs(distance)
    print(ref_vec, distance)
    if min_dist == None or min_dist > distance:
        min_dist = distance
        result_vec = ref_vec

结果有效,但是是否有本机numpy解决方案来提高效率? 我的问题是,矩阵越大,整个过程就越慢。 还有其他解决方案可以更优雅,更有效地解决这些问题吗?

1 个答案:

答案 0 :(得分:3)

方法1

我们可以使用Cython-powered kd-tree for quick nearest-neighbor lookup,它在内存和性能上都非常有效-

In [276]: from scipy.spatial import cKDTree

In [277]: matrix[cKDTree(matrix).query(search_vec, k=1)[1]]
Out[277]: array([2, 2])

方法2

使用SciPy's cdist-

In [286]: from scipy.spatial.distance import cdist

In [287]: matrix[cdist(matrix, np.atleast_2d(search_vec)).argmin()]
Out[287]: array([2, 2])

方法3

使用Scikit-learn's Nearest Neighbors-

from sklearn.neighbors import NearestNeighbors

nbrs = NearestNeighbors(n_neighbors=1).fit(matrix)
closest_vec = matrix[nbrs.kneighbors(np.atleast_2d(search_vec))[1][0,0]]

方法4

使用Scikit-learn's kdtree-

from sklearn.neighbors import KDTree
kdt = KDTree(matrix, metric='euclidean')
cv = matrix[kdt.query(np.atleast_2d(search_vec), k=1, return_distance=False)[0,0]]

方法5

eucl_dist包中(免责声明:我是它的作者),在wiki contents之后,我们可以利用matrix-multiplication-

M = matrix.dot(search_vec)
d = np.einsum('ij,ij->i',matrix,matrix) + np.inner(search_vec,search_vec) -2*M
closest_vec = matrix[d.argmin()]