这个问题与我之前的问题类似。 (Loop for applying mask and creating a vector of means)
我想读取45,259,200 x 21个数据帧并为每列应用2个掩码。我的面具是259,200 x 1
pdt <- (theData * mask1)*mask2
Error in Ops.data.frame(theData, mask1) :
* only defined for equally-sized data frames
如何将遮罩应用于每个列?
然后我想要一个每个蒙版列的均值数据框给我45 x 21数据帧。 这是完整的编码:
dataDir <-"C:\\dir\\"
patternC <-"pattern_"
filesSizeC = sort(list.files(dataDir,patternC))
#(filesSizeC)
for (i in 1:length(filesSizeC)) {
theData<-read.table(paste(dataDir,filesSizeC[i],sep=""),header=F,sep="\t")
theData
pdt <- (theData*mask1)*mask2
pdt[pdt == 0] <- NA #all zeros become NA's
if (i>1) {
theMeanValues <- c(theMeanValues, mean(pdt$V1:pdt$V21, na.rm=T))
} else {
theMeanValues <- c(mean(pdt$V1:pdt$V21, na.rm=T))
}
}
由于
编辑 - 13-8-13
好的,所以我已经能够应用这两个面具,我在这里做过:
pdt <- theData * rep(mask_1, ncol(theData))
pdt <- pdt * rep(mask_2, ncol(pdt))
pdt[pdt == 0] <- NA #all zeros become NA's
现在这给了我,
> summary(pdt)
V1 V2 V3 V4
Min. : 20261945 Min. : 21312164 Min. : 22243882 Min. : 23064587
1st Qu.: 91201092 1st Qu.: 95889488 1st Qu.:100047585 1st Qu.:103709299
Median :205769790 Median :216624073 Median :226261360 Median :234756158
Mean :231083595 Mean :242654479 Mean :252906061 Mean :261926034
3rd Qu.:345700883 3rd Qu.:363602884 3rd Qu.:379489788 3rd Qu.:393487592
Max. :741504636 Max. :776855896 Max. :808103971 Max. :835543870
NA's :259065 NA's :259065 NA's :259065 NA's :259065
...
V21
Min. : 27844725
1st Qu.:124843018
Median :284331924
Mean :314292645
3rd Qu.:475087713
Max. :993931538
NA's :259065
我想在没有NA的情况下获取每列的均值。
在这个更简单的例子中,我想要一个循环来为每列创建一个1 x 21的数据框。
mat1 <- matrix(rnorm(10), nrow=5, ncol=21)
mat1 <- data.frame(mat1)
mat1
X1 X2 X3 X4 ......
1 0.56660450 0.1690268 0.56660450 0.1690268
2 0.01571945 1.1650268 0.01571945 1.1650268
3 0.38305734 -0.0442040 0.38305734 -0.0442040
4 -0.04513712 -0.1003684 -0.04513712 -0.1003684
5 0.03435191 -0.2834446 0.03435191 -0.2834446
for (i in 1:length(mat1)) {
if (i>1) {
theMeanValues <- c(themeanvalues, mean(mat1$[i]), na.rm=T)
} else {
theMeanValues <- c(mean(mat1$[i]), na.rm=T)
}
}
编码不起作用,我想我需要更改mean(mat1$[i])
的语法但不确定是什么。
答案 0 :(得分:2)
您没有使用正确的语法来选择矩阵的列,并且括号不在正确的位置。使用循环是缓慢而麻烦的。使用colMeans()函数。
> mat1 <- matrix(rnorm(21 * 1e6), ncol = 21)
> mat1 <- data.frame(mat1)
>
> system.time({
+ for (i in seq_len(ncol(mat1))) {
+ if (i>1) {
+ theMeanValues <- c(theMeanValues, mean(mat1[, i], na.rm = TRUE))
+ } else {
+ theMeanValues <- mean(mat1[, i], na.rm = TRUE)
+ }
+ }
+ })
user system elapsed
0.53 0.05 0.58
> system.time({
+ theMeanValues2 <- colMeans(mat1, na.rm = TRUE)
+ })
user system elapsed
0.16 0.09 0.25
> names(theMeanValues2) <- NULL
> all.equal(theMeanValues, theMeanValues2)
[1] TRUE