将新数据呈现给拟合的自组织映射并将行分配给集群

时间:2017-05-15 20:37:45

标签: r

我正在使用此代码,它适合自组织映射(SOM),然后聚合生成的原型向量以定义集群边界:

library(dplyr)
library(kohonen)

setwd('C:\\Users\\Bla\\Source\\Repos\\SomeExcitingRepo')

OrginalData <- read.table("IrisData.txt",
                   header = TRUE, sep = "\t")

SubsetData <- subset(OrginalData, select = c("SepalLength", "SepalWidth", "PetalLength", "PetalWidth"))
TrainingMatrix <- as.matrix(scale(SubsetData))

GridDefinition <- somgrid(xdim = 4, ydim = 4, topo = "hexagonal")

SomModel <- kohonen::supersom(data = TrainingMatrix, grid = GridDefinition, rlen = 1000, alpha = c(0.05, 0.01),
             keep.data = TRUE)
groups = 3
iris.hc = cutree(hclust(dist(SomModel$codes[[1]])), groups)

plot(SomModel, type = "codes", bgcol = rainbow(groups)[iris.hc])
add.cluster.boundaries(SomModel, iris.hc)

数据是虹膜数据集,但这只是一个例子。数据集的格式如下:

Uid SepalLength SepalWidth  PetalLength PetalWidth  Species
1   5.1 3.5 1.4 0.2 setosa

现在让我们假设这是一个看不见的数据集。我想将其标准化并将其呈现给SOM,然后向每一行添加指示SOM群集编号的附加列(1,2,3见上例)和获胜节点的x和y坐标。例如:

Uid SepalLength SepalWidth PetalLength PetalWidth Species Cluster X Y
1 5.1 3.5 1.4 0.2 setosa 3 3 4

1 个答案:

答案 0 :(得分:1)

您可以使用unit.classif索引群集或网格点:

result <- OrginalData
result$Cluster <- iris.hc[SomModel$unit.classif]
result$X <- SomModel$grid$pts[SomModel$unit.classif,"x"]
result$Y <- SomModel$grid$pts[SomModel$unit.classif,"y"]

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Cluster   X         Y
1          5.1         3.5          1.4         0.2  setosa       1 1.5 2.5980762
2          4.9         3.0          1.4         0.2  setosa       1 1.0 3.4641016
3          4.7         3.2          1.3         0.2  setosa       1 1.0 3.4641016
4          4.6         3.1          1.5         0.2  setosa       1 1.0 3.4641016
5          5.0         3.6          1.4         0.2  setosa       1 1.0 1.7320508
6          5.4         3.9          1.7         0.4  setosa       1 1.5 0.8660254

但它看起来并不那么好:

points(jitter(result$X), jitter(result$Y), col=result$Species)
legend(5,0, legend=unique(result$Species), col=unique(result$Species), pch=1)

enter image description here

相关问题