Question

我有两个file.dat（random1.dat和random2.dat），它们是从随机均匀分布（更改种子）生成的：

http://www.filedropper.com/random1_1：random1.dat http://www.filedropper.com/random2：random2.dat

我喜欢使用R来使X平方理解这两个分布在统计上是否相同。要做到这一点我证明：

x1 -> read.table("random1.dat")
x2 -> read.table("random2.dat")
chisq.test(x1,x2)

但收到错误消息：

＆＃39; X＆＃39;并且＆＃39; y＆＃39;需要有相同的长度

现在的问题是这两个文件都是1000行。所以我不明白。另一个问题是，如果我想使用100个不同的文件将此过程自动（迭代）为100次，我可以制作类似的东西：

DO i=1,100
x1 -> read.table("random'(i)'.dat")
x2 -> read.table("fixedfile.dat")
chisq.test(x1,x2)
save results from the chisq analys
END DO

非常感谢你的帮助。

增加：

@ eipi10，

我尝试使用您在此处提供的第一种方法，它适用于您在此处生成的数据。然后，当我尝试使用我的数据时（我在一个文件中放入一个包含1000行两个均匀分布的2列矩阵enter link description here并使用不同的种子）某些东西无法正常工作：

我用dat = read.table("random2col.dat");
我使用命令：csq = lapply(dat[,-1], function(x) chisq.test(cbind(dat[,1],x)))并显示警告消息;
最后我使用：unlist(lapply(csq, function(x) x$p.value))但是输出类似于：

[...] 1 1 1 1 1 1 1 1 1 1 1 1 1 [963] 1 1 1 1 1 ..... 1 1 1 1 [1000] 1

Answer 1

我认为你不需要使用循环。您可以改用lapply。此外，您输入x1和x2作为单独的数据列。执行此操作时，chisq.test将从这两列中计算列联表，这对于实数列来说没有意义。相反，您需要向chisq.test提供列为x1和x2的单个矩阵或数据框。但即使这样，chisq.test也期待计数数据，这不是你在这里所拥有的（尽管“预期”频率不一定是整数）。无论如何，这里有一些代码可以让你的测试以你希望的方式运行：

# Simulate data: 5 columns of data, each from the uniform distribution
dat = data.frame(replicate(5, runif(20)))

# Chi-Square test of each column against column 1.
# Note use of cbind to combine the two columns into a single data frame, 
# rather than entering each column as separate arguments.
csq = lapply(dat[,-1], function(x) chisq.test(cbind(dat[,1],x)))

# Look at Chi-square stats and p-Values for each test
sapply(csq, function(x) x$statistic)
sapply(csq, function(x) x$p.value)

另一方面，如果您打算将数据转换为两个值流，然后将其转换为列联表，这里有一个示例：

# Simulate data of 5 factor variables, each with 10 different levels
dat = data.frame(replicate(5, sample(c(1:10), 1000, replace=TRUE)))

# Chi-Square test of each column against column 1. Here the two columns of data are 
# entered as separate arguments, so that chisq.test will convert them to a two-way 
# contingency table before doing the test.
csq = lapply(dat[,-1], function(x) chisq.test(dat[,1],x))

# Look at Chi-square stats and p-Values for each test
sapply(csq, function(x) x$statistic)
sapply(csq, function(x) x$p.value)

两个不同分布的R卡方统计量

1 个答案: