Numpy解决方案：

Question

我有一个numpy矩阵中的点列表，

A = [[x11,x12,x13],[x21,x22,x23] ]

我有一个点o= [o1,o2,o3]，我必须从中计算每个点的距离，

A - o会从每个点减去o。目前我必须做每个属性和加法运算的平方，我在for循环中做。有更直观的方法吗？

P.S：我正在进行上述计算，作为kmeans集群应用程序的端口。我已经计算了质心，现在我必须从质心的每个点计算距离。

input_mat = input_data_per_minute.values[:,2:5]

scaled_input_mat = scale2(input_mat)

k_means = cluster.KMeans(n_clusters=5)

print 'training start'
k_means.fit(scaled_input_mat)
print 'training over'

out = k_means.cluster_centers_

我必须计算input_mat与每个群集质心之间的距离。

Answer 1

Numpy解决方案：

Numpy非常适合广播，所以你可以一步到位地欺骗它。但它会占用大量内存，具体取决于点数和集群中心。实际上它会创建一个number_of_points * number_of_cluster_centers * 3数组：

首先，您需要了解一下广播，我会自己动手并定义每个尺寸。

我将首先定义一些点和中心用于说明目的：

import numpy as np

points = np.array([[1,1,1],
                   [2,1,1],
                   [1,2,1],
                   [5,5,5]])

centers = np.array([[1.5, 1.5, 1],
                    [5,5,5]])

现在我准备这些数组，以便我可以使用numpy广播来获得每个维度的距离：

distance_3d = points[:,None,:] - centers[None,:,:]

有效地，第一个维度现在是点＆＃34;标签＆＃34;，第二个维度是中心＆＃34;标签＆＃34;第三个维度是坐标。减法是为了获得每个维度的距离。结果将具有形状：

(number_of_points, number_of_cluster_centers, 3)

现在它只是应用欧氏距离公式的问题：

# Square each distance
distance_3d_squared = distance_3d ** 2

# Take the sum of each coordinates distance (the result will be 2D)
distance_sum = np.sum(distance_3d_squared, axis=2)

# And take the square root
distance = np.sqrt(distance_sum)

对于我的测试数据，最终结果是：

#array([[ 0.70710678,  6.92820323],
#       [ 0.70710678,  6.40312424],
#       [ 0.70710678,  6.40312424],
#       [ 6.36396103,  0.        ]])

因此，distance[i, j]元素会为您提供点i到中心j的距离。

要点：

您可以将所有这些放在一行中：

distance2 = np.sqrt(np.sum((points[:,None,:] - centers[None,:,:]) ** 2, axis=2))

Scipy解决方案（更快和更短）：

或者如果你使用scipy cdist：

from scipy.spatial.distance import cdist
distance3 = cdist(points, centers)

结果将始终相同，但cdist是许多积分和中心的最快。

Answer 2

你应该能够做到这样的事情:(假设我正确地读了你的问题;））

In [1]: import numpy as np

In [2]: a = np.array([[11,12,13],[21,22,23]])

In [3]: o = [1,2,3]

In [4]: a - o  # just showing
Out[4]: 
array([[10, 10, 10],
       [20, 20, 20]])

In [5]: a ** 2  # just showing
Out[5]: 
array([[121, 144, 169],
       [441, 484, 529]])

In [6]: b = (a ** 2) + (a - o)

In [7]: b
Out[7]: 
array([[131, 154, 179],
       [461, 504, 549]])

Numpy非常棒，因为它通过数组元素移动！这意味着90％以上的时间可以在没有for循环的情况下迭代数组。在阵列外部使用for循环也要慢得多。

使用numpy矩阵计算距离的Pythonic方法？

2 个答案:

Numpy解决方案：

要点：

Scipy解决方案（更快和更短）：