对Perl更好的卡方检验?

时间:2014-01-18 13:24:03

标签: perl statistics

假设我将6面模具掷60次,我分别获得数字1到6的16,5,9,7,6,15个角色。数字1和6显示太多there's only about a 1.8% chance of that being random。如果我使用Statistics::ChiSquare,则打印出来:

There's a >1% chance, and a <5% chance, that this data is random.

因此,它不仅是一个糟糕的界面(我不能直接得到这些数字),但舍入误差很大。

更糟糕的是,如果我掷两个六面骰子怎么办?获得任何特定数字的几率是:

Sum Frequency   Relative Frequency 
2   1           1/36 
3   2           2/36                                                                                                                                                                                                               
4   3           3/36
5   4           4/36
6   5           5/36
7   6           6/36
8   5           5/36
9   4           4/36
10  3           3/36
11  2           2/36
12  1           1/36

Statistics::ChiSquare used to have a chisquare_nonuniform() function,但已删除。

所以数字很差,我不能用它来进行非均匀分布。给出一个实际频率列表和预期频率列表,在Perl中计算卡方检验的最佳方法是什么?我在CPAN上找到的各种模块都没有帮助我,所以我猜我错过了一些明显的东西。

1 个答案:

答案 0 :(得分:15)

自己实现这一点非常简单,我不想仅为此上传Yet Another Statistics Module。

use Carp qw< croak >;
use List::Util qw< sum >;
use Statistics::Distributions qw< chisqrprob >;

sub chi_squared_test {
  my %args = @_;
  my $observed = delete $args{observed} // croak q(Argument "observed" required);
  my $expected = delete $args{expected} // croak q(Argument "expected" required);
  @$observed == @$expected or croak q(Input arrays must have same length);

  my $chi_squared = sum map {
    ($observed->[$_] - $expected->[$_])**2 / $expected->[$_];
  } 0 .. $#$observed;
  my $degrees_of_freedom = @$observed - 1;
  my $probability = chisqrprob($degrees_of_freedom, $chi_squared);
  return $probability;
}

say chi_squared_test
  observed => [16, 5, 9, 7, 6, 17],
  expected => [(10) x 6];

输出:0.018360