KNN when using a precomputed affinity matrix in Scikit's spectral clustering?

时间:2016-10-20 13:03:10

标签: python machine-learning scikit-learn cluster-analysis unsupervised-learning

I have a similarity matrix that I have calculated between a large number of objects, and each object can have a non-zero similarity with any other object. I generated this matrix for another task, and would now like to cluster it for a new analysis.

It seems like scikit's spectral clustering method could be a good fit, because I can pass in a precomputed affinity matrix. I also know that spectral clustering typically uses some number of nearest neighbors when building the affinity matrix, and my similarity matrix does not have that same constraint.

If I pass in a matrix that allows any number of edges between nodes in the affinity matrix, will scikit limit each node to having only a certain number of nearest neighbors? If not, I guess I will have to make that change to my pre-computed affinity matrix.

2 个答案:

答案 0 :(得分:1)

您不必自己计算亲和力来进行一些谱聚类,sklearn会为您做这些。

当您致电sc = SpectralClustering()时,affinity参数允许您选择用于计算亲和度矩阵的内核。 rbf默认情况下似乎是内核,不使用特定数量的最近邻居。但是,如果您决定选择其他内核,则可能需要使用n_neighbours参数指定该数字。

然后,您可以使用sc.fit_predict(your_matrix)来计算群集。

答案 1 :(得分:1)

光谱聚类不需要稀疏矩阵。

但如果我没弄错的话,找到稀疏矩阵的最小非零特征向量而不是密集矩阵会更快。最坏的情况可能仍然是O(n ^ 3) - 谱聚类是你能找到的最慢的方法之一。