使用dplyr :: mutate()的列子集上的行方式cor()

时间:2015-03-02 10:17:17

标签: r rows subset correlation dplyr

set.seed(8)
df <- data.frame(
  A=sample(c(1:3), 10, replace=T), 
  B=sample(c(1:3), 10, replace=T),
  C=sample(c(1:3), 10, replace=T),
  D=sample(c(1:3), 10, replace=T),
  E=sample(c(1:3), 10, replace=T), 
  F=sample(c(1:3), 10, replace=T))

想要将列的子集传递给dplyr mutate()并进行逐行计算,例如cor()以获得列A-C和D-F之间的相关性,但无法弄清楚如何。找到了SO灵感hereherehere,但却未能产生可接受的代码。例如,我试过这个:

require(plyr)
require(dplyr)
df %>%
  rowwise() %>%
  mutate(c=cor(.[[1:3]],.[[4:6]]))

2 个答案:

答案 0 :(得分:6)

你可以尝试

df %>% 
   rowwise() %>% 
   do(data.frame(., Cor=cor(unlist(.[1:3]), unlist(.[4:6]))))

答案 1 :(得分:1)

这是FAY(2017)的另一个解决方案。

> library(tidystringdist)
> comb <- tidy_comb_all(names(airquality))
> comb
# A tibble: 15 x 2
   V1      V2     
 * <chr>   <chr>  
 1 Ozone   Solar.R
 2 Ozone   Wind   
 3 Ozone   Temp   
 4 Ozone   Month  
 5 Ozone   Day    
 6 Solar.R Wind   
 7 Solar.R Temp   
 8 Solar.R Month  
 9 Solar.R Day    
10 Wind    Temp   
11 Wind    Month  
12 Wind    Day    
13 Temp    Month  
14 Temp    Day    
15 Month   Day    

我们得到了对的组合。

> bulk_cor <-
+   comb %>%
+   pmap(~ cor.test(airquality[[.x]], airquality[[.y]])) %>%
+   map_df(broom::tidy) %>%
+   bind_cols(comb, .)
> bulk_cor
# A tibble: 15 x 10
   V1      V2      estimate statistic  p.value parameter conf.low conf.high method       alternative
   <chr>   <chr>      <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl> <fct>        <fct>      
 1 Ozone   Solar.R  0.348      3.88   1.79e- 4       109   0.173     0.502  Pearson's p~ two.sided  
 2 Ozone   Wind    -0.602     -8.04   9.27e-13       114  -0.706    -0.471  Pearson's p~ two.sided  
 3 Ozone   Temp     0.698     10.4    2.93e-18       114   0.591     0.781  Pearson's p~ two.sided  
 4 Ozone   Month    0.165      1.78   7.76e- 2       114  -0.0183    0.337  Pearson's p~ two.sided  
 5 Ozone   Day     -0.0132    -0.141  8.88e- 1       114  -0.195     0.169  Pearson's p~ two.sided  
 6 Solar.R Wind    -0.0568    -0.683  4.96e- 1       144  -0.217     0.107  Pearson's p~ two.sided  
 7 Solar.R Temp     0.276      3.44   7.52e- 4       144   0.119     0.419  Pearson's p~ two.sided  
 8 Solar.R Month   -0.0753    -0.906  3.66e- 1       144  -0.235     0.0882 Pearson's p~ two.sided  
 9 Solar.R Day     -0.150     -1.82   7.02e- 2       144  -0.305     0.0125 Pearson's p~ two.sided  
10 Wind    Temp    -0.458     -6.33   2.64e- 9       151  -0.575    -0.323  Pearson's p~ two.sided  
11 Wind    Month   -0.178     -2.23   2.75e- 2       151  -0.328    -0.0202 Pearson's p~ two.sided  
12 Wind    Day      0.0272     0.334  7.39e- 1       151  -0.132     0.185  Pearson's p~ two.sided  
13 Temp    Month    0.421      5.70   6.03e- 8       151   0.281     0.543  Pearson's p~ two.sided  
14 Temp    Day     -0.131     -1.62   1.08e- 1       151  -0.283     0.0287 Pearson's p~ two.sided  
15 Month   Day     -0.00796   -0.0978 9.22e- 1       151  -0.166     0.151  Pearson's p~ two.sided  

现在,您可以使用dplyr::filter对所需结果进行分组。

<强> Biboligraphy

科林,FAY。 2017.“疯狂的小东西叫做purrr - 第6部分:做统计。”https://colinfay.me/purrr-statistics/