Question

我使用嵌套数据框来嵌套某些组，然后对$ data列中的因子和值运行t测试。但是，在某些情况下，我最终在$ data列中没有两个可用因素。因此，t测试不能运行，代码将产生整个数据帧的错误。在下面的示例中，组a-d将具有两种可用于比较的处理。但是，g roup e不会。如何指定t测试仅在两种处理都可用的行上运行？

set.seed(1)
df <- data.frame(id = paste0('ID-', 1:100),
                 group = rep(c('a', 'b', 'c', 'd', 'e'), each = 20),
                 treatment = c(rep(c('x', 'y'), 40), rep('x', 20)),
                 value = runif(100))

df_analysis <- df %>% 
  nest(-group) %>% 
  #How to ask to only run t test on rows that have both treatments in them? As written, it will give an error.
  mutate(p = map_dbl(data, ~t.test(value ~ treatment, data=.)$p.value))

Answer 1

由于您已经使用了一些tidyverse包，您可以使用一些咕噜声功能来捕捉副作用。在这种情况下，您可以使用possibly，它会在发生错误时使用默认值。

使用您的代码：

library(dplyr)
library(purrr)
library(tidyr)

set.seed(1)
df <- data_frame(id = paste0('ID-', 1:100),
                 group = rep(c('a', 'b', 'c', 'd', 'e'), each = 20),
                 treatment = c(rep(c('x', 'y'), 40), rep('x', 20)),
                 value = runif(100))

df_analysis  <- df %>% 
  nest(-group) %>% 
  mutate(p = map_dbl(data, possibly(~t.test(value ~ treatment, data=.)$p.value, NA_real_)))

# A tibble: 5 x 3
  group data                   p
  <chr> <list>             <dbl>
1 a     <tibble [20 x 3]>  0.610
2 b     <tibble [20 x 3]>  0.156
3 c     <tibble [20 x 3]>  0.840
4 d     <tibble [20 x 3]>  0.383
5 e     <tibble [20 x 3]> NA

Answer 2

将t.test(...)包裹在ifelse()中，检查treatment中{1}}的唯一商品数量是否为==2

df %>% 
  nest(-group) %>% 
  mutate(p = map_dbl(data, ~ifelse(length(unique(.x$treatment)) == 2, t.test(value ~ treatment, data=.)$p.value, NA)))

# A tibble: 5 x 3
  # group data                        p
  # <fct> <list>                  <dbl>
# 1 a     <data.frame [20 x 3]>  0.790 
# 2 b     <data.frame [20 x 3]>  0.0300
# 3 c     <data.frame [20 x 3]>  0.712 
# 4 d     <data.frame [20 x 3]>  0.662 
# 5 e     <data.frame [20 x 3]> NA

在嵌套数据框中，筛选包含特定字符串的行

2 个答案: