检索高度(z)下簇(k)中的观测数

时间:2013-10-24 16:29:33

标签: r cluster-analysis

给定一个树形图y,它在高度值z下有k个簇,我想知道:

使用了多少观察数来形成聚类数(k)?

以下是一些可重现的代码和用于说明问题的图片:

#Necessary packages to reproduce the code
library(ggplot2)
library(cluster)

#Example data
x = c(6.2, 2.3, 0, 1.54, 2.17, 6.11, 0.3, 1.39, 
  5.14, 12.52, 12.57, 7.13, 13.71, 11.42, 
  8.13, 8.86, 9.97, 10, 8.23, 12.4, 9.51, 
  20.56, 17.78, 14.91, 19.17, 17.48, 17.44, 
  21.32, 
  21.24)

y = c(7.89, 7.63, 5.29, 8.38, 8.37, 10.5, 21.5,
  16.65, 23.76, 1.77, 1.8, 10.49, 14.01, 
  10.36, 10.85, 15.02, 14.91, 14.94, 10.76,
  18.58, 23.12, 0, 13.59, 9.68, 17.32, 17.85,
  17.79, 4.13, 4.05)

df = data.frame(cbind(x,y))
obs = NROW(df[,1]) #number of data observations
obs
[1] 29

#Clustering
agnes=agnes(df, metric="euclidean", stand=F, method="average")
k_number=sum(agnes$height < 1) #number of clusters under dendrogram's height value of 1
k_number
[1] 7 # k_number resulted in 7 groups/clusters

plot(agnes,which.plots=2)

红色的备注在R外部绘制,它们表示在高度1下分组的7个群集。 enter image description here

ggplot(df,aes(x,y)) + xlim(0,22) + ylim(0,25) +
  geom_point() +
  geom_text(aes(label=row.names(df)),hjust=0.5, vjust=-1.5, cex=5)

enter image description here

好的,有13个观察点来自7个星团。

我想检索数字13。

我曾尝试阅读大量文档,但由于我对R和聚类技术不太熟悉,因此我很难找到它。 TKS。

1 个答案:

答案 0 :(得分:6)

这应该可以解决问题

# convert to hclust object and obtain cluster assignments for the observations
R> cl <- cutree(as.hclust(agnes), h=1)  
R> cl
 [1]  1  2  3  2  2  4  5  6  7  8  8  9 10 11 12 13 14 14 12 15 16 17 18 19 20
[26] 21 21 22 22
# find non-unique assignments
R> res <- table(cl) 
R> res[res > 1]
cl
 2  8 12 14 21 22 
 3  2  2  2  2  2 
R> sum(res[res > 1])
[1] 13

更新:截止h = 2

R> cl <- cutree(as.hclust(agnes), h=2) 
R> cl
 [1]  1  2  3  2  2  4  5  6  7  8  8  4  9 10  4 11 11 11  4 12 13 14 15 16 17
[26] 17 17 18 18
R> res <- table(cl) 
R> res[res > 1]
cl
 2  4  8 11 17 18 
 3  4  2  3  3  2 
R> sum(res[res > 1])
[1] 17