Question

如果我们将K-means和顺序K-means方法应用于具有相同初始设置的相同数据集，我们是否会获得相同的结果？解释你的理由。

我个人认为答案是否定的。顺序K-means获得的结果取决于数据点的呈现顺序。结局条件不一样。

这里附加两个聚类算法的伪代码。

K均值

Make initial guesses for the means m1, m2, ..., mk
Until there is no change in any mean
    Assign each data point to the cluster whose mean is the nearest.
    Calculate the mean of each cluster.
    For i from 1 to k
        Replace mi with the mean of all examples for cluster i.
    end_for
end_until

顺序K-means

Make initial guesses for the means m1, m2, ..., mk
Set the counts n1, n2, ..., nk to zero
Until interrupted
    Acquire the next example, x
    If mi is closest to x
        Increment ni
        Replace mi by mi + (1/ni)*(x - mi)
    end_if
end_until

Answer 1

正确，结果可能不同。

点：x1 =（0,0），x2 =（1,1），x3 =（0.75,0），x4 =（0.25,1）; m1 =（0,0.5），m2 =（1,0.5）。 K-means将x1和x4分配给m1-cluster，将x2和x3分配给m2-cluster。新均值为m1'=（0.125,0.5），m2'=（0.875,0.5），不进行重新分配。对于顺序K均值，在指定x1之后，m1移动到（0,0），x2移动m2到（1,1）。那么m1最接近于x3，所以m1移动到（0.375,0）。最后，m2最接近x4，因此m2移动到（0.625,1）。这又是一个稳定的配置。

K-means和顺序K-means的结果相同？

1 个答案: