拆分功能不能维护数据帧的结构吗?

时间:2019-01-19 18:29:02

标签: r

我正在R中进行分层聚类,并且需要聚类的所有元素。

当我使用以下数据时,将数据分成3个num [1:2628]列表(原始数据帧(dataA)中的列信息均不会传输)

Boolean bool = my_driver.findElements(By.id("my element id")).size()>0;

如何确保它保持数据A的结构

编辑: 就我而言

clusterA <- hclust(dist(dataA),method = "single")
NumA = 3
label <- cutree(clusterA, NumA)

clusterXlist<-split(dataA,f=label)
str(clusterXlist[[1]])

其中dataA

>str(clusterXlist[[1]])


num [1:2628] 0.0529 -0.3909 -0.4465 0.1 0.8393 ...

edit2: 用于数据A

> str(dataA)


num [1:440, 1:6] 0.0529 -0.3909 -0.4465 0.1 0.8393 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:6] "Fresh" "Milk" "Grocery" "Frozen" ...
 - attr(*, "scaled:center")= Named num [1:6] 12000 5796 7951 3072 2881 ...
  ..- attr(*, "names")= chr [1:6] "Fresh" "Milk" "Grocery" "Frozen" ...
 - attr(*, "scaled:scale")= Named num [1:6] 12647 7380 9503 4855 4768 ...
  ..- attr(*, "names")= chr [1:6] "Fresh" "Milk" "Grocery" "Frozen" ...

对于clusterXlist [[1]],它是通过分割数据A获得的

    > dput(head(dataA,n=20))
structure(c(0.0528730042415329, -0.390857056063646, -0.44652098379972, 
0.0999975794271863, 0.839284119671916, -0.204572661537808, 0.00993903725191922, 
-0.349583518736614, -0.477357534676238, -0.473957607271904, -0.682697336282181, 
0.0905884780058897, 1.55872457204484, 0.728746944991474, 1.00042486502152, 
-0.138155475034538, -0.868191050016313, -0.484236457564077, 0.521904849881291, 
-0.333690834823332, 0.522972471408079, 0.543838613660349, 0.408073194590386, 
-0.623310408164662, -0.0523368792616442, 0.333686752405346, -0.351915064454946, 
-0.113851350576777, -0.291078065290861, 0.717677967619194, -0.053285340273111, 
-0.63306600713975, 0.883794139056095, 0.0557876760455718, 0.497093035238056, 
-0.634420951441845, 0.409157150032062, 0.0488774601048851, 0.0719115132405076, 
-0.447303143322465, -0.0410681453901357, 0.170124700204028, -0.0281250860936324, 
-0.3925300807586, -0.0792659545334748, -0.297298628211157, -0.10273182626616, 
0.15518230654465, -0.185125447641461, 1.15011422238562, 0.528531691780372, 
-0.360751187201331, 0.400469064432042, 0.739829765498898, 0.435615257968889, 
-0.434621330503326, 0.438772101699743, -0.528063904936618, 0.226000834240152, 
0.159180975270399, -0.588697039406295, -0.269829034507317, -0.137379339965946, 
0.68636300602308, 0.173661155768845, -0.495590877769126, -0.533904475256987, 
-0.288985833251248, -0.545233764836731, -0.394039245717966, 0.273564891153861, 
-0.340276616984998, -0.573659982327726, 0.00475174748902491, 
-0.572218072744849, -0.551001403168238, -0.605176006067741, -0.459955112363749, 
-0.178576756619561, -0.494972916519322, -0.0435191938188023, 
0.0863085949200282, 0.13308015693741, -0.498021323377842, -0.23165413161966, 
-0.227878848586867, 0.0542186891412866, 0.0921812574154842, -0.244448146341904, 
0.952945788892319, 0.649245242698738, -0.489212329634658, 0.209634507324604, 
0.802353943473126, 0.456496070080021, -0.40217108193415, 0.341140199633565, 
-0.526755422016323, -0.0240135648160378, -0.0762383134363428, 
-0.066263629344282, 0.0890496850231094, 2.24074190324533, 0.0933048443208461, 
1.29786952218849, -0.0261942126239276, -0.347458739603052, 0.369181005457445, 
-0.274766434933383, 0.203229792845712, 0.0777025935624781, -0.364479376793999, 
0.498608767430271, -0.327246732938803, 0.228051555415843, -0.394620088486301, 
-0.157749554245622, 1.04716972023017, 0.587257919466454, -0.36306099036142
), .Dim = c(20L, 6L), .Dimnames = list(NULL, c("Fresh", "Milk", 
"Grocery", "Frozen", "Detergents_Paper", "Delicassen")))

1 个答案:

答案 0 :(得分:0)

您拥有的是矩阵,而不是数据框。

class(dataA)
# [1] "matrix"

要做split()的快速简便的方法是

split(as.data.frame(dataA), label)

但是,这可能会在以后的计算中引起问题,您可能需要求助于将这些列表元素 back 强制为矩阵。我建议您使用lapply()分割数据,如下所示。

clusterXlist <- lapply(
    unique(label), 
    function(i) dataA[label == i, , drop = FALSE]
)

在整个列表元素中正确维护矩阵结构。

str(clusterXlist[[1]])
# num [1:18, 1:6] 0.0529 -0.3909 0.1 0.8393 -0.2046 ...
# - attr(*, "dimnames")=List of 2
#  ..$ : NULL
#  ..$ : chr [1:6] "Fresh" "Milk" "Grocery" "Frozen" ...