求K的最佳值

时间:2018-06-19 11:30:09

标签: numpy k-means

对于k个聚类,如何计算从质心到聚类中每个点的mean_distances。

公式:

enter image description here

我的代码:

def mean_distances(k, X):
"""
Arguments:

k -- int, number of clusters
X -- np.array, matrix of input features

Returns:

Array of shape (k, ), containing mean of sum distances 
    from centroid to each point in the cluster for k clusters
"""

### START CODE HERE ###
mod = KMeans(X, k)
clusters, final_centrs = mod.final_centroids()
dist = []
for i in range(k):
    d =  np.sum(np.linalg.norm((clusters[i] - final_centrs[i, :])**2)).mean()
    dist.append(d)
return dist
### END CODE HERE ###

但是它不能正常工作。 (不带scklearn的PS,只有麻木)

1 个答案:

答案 0 :(得分:0)

您正在获取外部总和的每个元素(即每个内部总和)的均值,而不是外部总和的均值:

import numpy as np
from sklearn.cluster import KMeans

def mean_distances(k, X):
    """
    Arguments:

        k -- int, number of clusters
        X -- np.array, matrix of input features

    Returns:

        Array of shape (k, ), containing mean of sum distances 
        from centroid to each point in the cluster for k clusters
    """

    mod = KMeans(X, k)
    clusters, final_centrs = mod.final_centroids()
    dist = []
    for i in range(k):
        d =  np.sum(np.linalg.norm((clusters[i] - final_centrs[i, :])**2))
        dist.append(d)
    return dist.mean()