Question

编辑：现在具有可重现的代码/数据。

我正在尝试对我的数据框中的多个变量进行卡方检验。

使用npk数据集：

单个变量N产生正确的结果。

npk %>%
  group_by(yield, N) %>%
  select(yield, N) %>% 
  table() %>% 
  print() %>% 
  chisq.test()

正如您所看到的，table()的输出采用chisq.test()可以使用的形式。

        N
  yield  0 1
    44.2 1 0
    45.5 1 0
    46.8 1 0
    48.8 1 1
    49.5 1 0
    49.8 0 1
    51.5 1 0
    52   0 1
    53.2 1 0
    55   1 0
    55.5 1 0
    55.8 0 1
    56   2 0
    57   0 1
    57.2 0 1
    58.5 0 1
    59   0 1
    59.8 0 1
    62   0 1
    62.8 1 1
    69.5 0 1

    Pearson's Chi-squared test

  data:  .
  X-squared = 20, df = 20, p-value = 0.4579

当我尝试使用循环进行多次测试时，调用特定变量的某些内容会更改我的表的输出，并且卡方测试无法运行。

创建循环运行的列表：

test_ordinal_variables <- noquote(names(npk[2:4]))
test_ordinal_variables

带有错误代码的循环:(为清晰起见，1：1，如果使用1：3则重复错误）

for (i in 1:1){
  npk %>%
    group_by(yield, test_ordinal_variables[i]) %>%
    select(yield, test_ordinal_variables[i]) %>%
    table() %>% 
    print() %>% 
    chisq.test()
}

输出清楚地显示chisq.test()无法解释的表格：

Adding missing grouping variables: `test_ordinal_variables[i]`
, , N = 0

                         yield
test_ordinal_variables[i] 44.2 45.5 46.8 48.8 49.5 49.8 51.5 52 53.2 55 55.5 55.8 56 57 57.2 58.5 59 59.8 62
                        N    1    1    1    1    1    0    1  0    1  1    1    0  2  0    0    0  0    0  0
                         yield
test_ordinal_variables[i] 62.8 69.5
                        N    1    0

, , N = 1

                         yield
test_ordinal_variables[i] 44.2 45.5 46.8 48.8 49.5 49.8 51.5 52 53.2 55 55.5 55.8 56 57 57.2 58.5 59 59.8 62
                        N    0    0    0    1    0    1    0  1    0  0    0    1  0  1    1    1  1    1  1
                         yield
test_ordinal_variables[i] 62.8 69.5
                        N    1    1

出于某种原因，test_ordinal_variables[i]并未完全评估我在循环中所期望的内容。您可以看到错误声称它是“添加缺少的分组变量”，但如果它只是评估表达式而不是添加变量，那么我认为它会起作用。

这可以按照我的预期进行评估。

> test_ordinal_variables[1]
[1] N

那么为什么它在循环中不会这样做呢？

Answer 1

由于您将动态引用变量传递给dplyr链式方法，请考虑group_by_()和select_()下划线对应版本。由于 yield 未动态传递，因此请将其转换为待处理的symbol()。

for (i in names(npk[2:4])){      
    npk %>%
      group_by_(as.symbol("yield"), i) %>%
      select_(as.symbol("yield"), i) %>%
      table() %>% 
      print() %>% 
      chisq.test() %>% 
      print()    
}

输出

      N
yield  0 1
  44.2 1 0
  45.5 1 0
  46.8 1 0
  48.8 1 1
  49.5 1 0
  49.8 0 1
  51.5 1 0
  52   0 1
  53.2 1 0
  55   1 0
  55.5 1 0
  55.8 0 1
  56   2 0
  57   0 1
  57.2 0 1
  58.5 0 1
  59   0 1
  59.8 0 1
  62   0 1
  62.8 1 1
  69.5 0 1

    Pearson's Chi-squared test

data:  .
X-squared = 20, df = 20, p-value = 0.4579

      P
yield  0 1
  44.2 0 1
  45.5 1 0
  46.8 1 0
  48.8 0 2
  49.5 0 1
  49.8 1 0
  51.5 1 0
  52   0 1
  53.2 0 1
  55   1 0
  55.5 1 0
  55.8 0 1
  56   1 1
  57   1 0
  57.2 1 0
  58.5 0 1
  59   0 1
  59.8 1 0
  62   1 0
  62.8 0 2
  69.5 1 0

    Pearson's Chi-squared test

data:  .
X-squared = 22, df = 20, p-value = 0.3405

      K
yield  0 1
  44.2 1 0
  45.5 0 1
  46.8 1 0
  48.8 0 2
  49.5 0 1
  49.8 0 1
  51.5 1 0
  52   1 0
  53.2 0 1
  55   0 1
  55.5 0 1
  55.8 0 1
  56   2 0
  57   0 1
  57.2 0 1
  58.5 0 1
  59   1 0
  59.8 1 0
  62   1 0
  62.8 2 0
  69.5 1 0

    Pearson's Chi-squared test

data:  .
X-squared = 24, df = 20, p-value = 0.2424

Warning messages:
1: In chisq.test(.) : Chi-squared approximation may be incorrect
2: In chisq.test(.) : Chi-squared approximation may be incorrect
3: In chisq.test(.) : Chi-squared approximation may be incorrect

R变量根据上下文的不同进行评估 - 循环与否

1 个答案: