Question

在使用k-means()函数在k = [2,3,4,5]数据集上使用不同数量的iris运行map()之后，我想解释不同{{1} }使用预定义的功能。

以下是我的尝试：

现在，我将预定义的函数library(dplyr) library(purrr) cluster_assignment <- map(2:5, function(k){ result <- kmeans((x = iris[-5] %>% scale()), centers = k) # # return results to a list x <- list(result$cluster, result$tot.withinss, result$centers, result$size) }) # assign cluster results back to the iris dataset a <- map_dfc(cluster_assignment, 1) colnames(a) <- paste0("result_", 2:5, "_cl") iris <- bind_cols(iris, a) > head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species result_2_cl result_3_cl result_4_cl result_5_cl 1 5.1 3.5 1.4 0.2 setosa 2 2 3 3 2 4.9 3.0 1.4 0.2 setosa 2 1 3 2 3 4.7 3.2 1.3 0.2 setosa 2 1 3 2 4 4.6 3.1 1.5 0.2 setosa 2 1 3 2 5 5.0 3.6 1.4 0.2 setosa 2 2 3 3 6 5.4 3.9 1.7 0.4 setosa 2 2 3 5应用于新分配的列，即cluster_result2

"result_2_cl", "result_3_cl", "result_4_cl", "result_5_cl"

如何使用# predefined function cluster_result2 <- function(x, ...){ x %>% group_by_(...) %>% summarise(size = n(), mean_spl = mean(Sepal.Length)) } # tried this method, but did not get the expected output map(iris[, colnames(a)], ~ cluster_result2(iris, .x))方法来实现？我发现了一种非常相似的方法here，但无法获得预期的输出。

除了将它们存储在嵌套的列表/数据框中外，预期的输出将与以下输出类似：

tidyverse

感谢您的回答！

Answer 1

我们可以使用group_by_at代替group_by_（已弃用）。在这里，我们需要遍历列名“ a”而不是列“ iris”

library(tidyverse)
map(colnames(a), ~ cluster_result2(iris, .x))

或者不使用~，将'x'参数指定为'iris'

map(colnames(a), cluster_result2, x = iris)
#[[1]]
# A tibble: 2 x 3
#  result_2_cl  size mean_spl
#        <int> <int>    <dbl>
#1           1    50     5.01
#2           2   100     6.26

#[[2]]
# A tibble: 3 x 3
#  result_3_cl  size mean_spl
#        <int> <int>    <dbl>
#1           1    47     6.78
#2           2    53     5.80
#3           3    50     5.01

#[[3]]
# A tibble: 4 x 3
#  result_4_cl  size mean_spl
#        <int> <int>    <dbl>
#1           1    50     6.14
#2           2    22     5.50
#3           3    29     7.00
#4           4    49     5.02

#[[4]]
# A tibble: 5 x 3
#  result_5_cl  size mean_spl
#        <int> <int>    <dbl>
#1           1    16     5.32
#2           2    29     7.00
#3           3    23     5.55
#4           4    34     4.86
#5           5    48     6.16

-将函数输出分别应用于列进行检查

cluster_result2(iris, colnames(a)[4])
# A tibble: 5 x 3
#  result_5_cl  size mean_spl
#        <int> <int>    <dbl>
#1           1    16     5.32
#2           2    29     7.00
#3           3    23     5.55
#4           4    34     4.86
#5           5    48     6.16

注意：由于随机性，输出会略有不同

将特定的列映射到带有两个参数的函数

1 个答案: