计算big.matrix的平均值

时间:2015-11-16 17:42:39

标签: r r-bigmemory

我正在使用bigmemorybiganalytics个包,并专门尝试计算big.matrix个对象的平均值。 biganalytics的文档(例如?biganalytics)表明mean()应该可用于big.matrix个对象,但这会失败:

x <- big.matrix(5, 2, type="integer", init=0, 
+   dimnames=list(NULL, c("alpha", "beta")))
x
# An object of class "big.matrix"
# Slot "address":
# <pointer: 0x00000000069a5200>
x[,1] <- 1:5
x[,]
#      alpha beta
# [1,]     1    0
# [2,]     2    0
# [3,]     3    0
# [4,]     4    0
# [5,]     5    0
mean(x)
# [1] NA
# Warning message:
# In mean.default(x) : argument is not numeric or logical: returning NA

虽然有些事情可行但是:

colmean(x)
# alpha  beta 
#     3     0 
sum(x)
# [1] 15
mean(x[])
# [1] 1.5
mean(colmean(x))
# [1] 1.5

没有mean(),似乎mean(colmean(x))是下一个最好的事情:

# try it on something bigger
x = big.matrix(nrow=10000, ncol=10000, type="integer")
x[] <- c(1:(10000*10000))
mean(colmean(x))
# [1] 5e+07
mean(x[])
# [1] 5e+07
system.time(mean(colmean(x)))
#    user  system elapsed 
#    0.19    0.00    0.19 
system.time(mean(x[]))
#   user  system elapsed 
#   0.28    0.11    0.39 

据推测,mean()可能会更快,尤其是对于具有大量列的矩形矩阵。

为什么mean()对我不起作用的任何想法?

1 个答案:

答案 0 :(得分:0)

好的 - 重新安装biganalytics似乎解决了这个问题。 我现在有:

library("biganalytics")
x = big.matrix(10000,10000, type="integer")
for(i in 1L:10000L) { j = c(1L:10000L) ; x[i,] <- i*10000L + j }
mean(x)
# [1] 50010001
mean(x[,])
# [1] 50010001
mean(colmean(x))
# [1] 50010001
system.time(replicate(100, mean(x)))
#   user  system elapsed 
#  20.16    0.02   20.23 
system.time(replicate(100, mean(colmean(x))))
#   user  system elapsed 
#  20.08    0.00   20.24 
system.time(replicate(100, mean(x[,])))
#   user  system elapsed 
#  31.62   12.88   44.74 

一切都很好。我的sessionInfo()现在是:

R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biganalytics_1.1.12 biglm_0.9-1         DBI_0.3.1           foreach_1.4.2       bigmemory_4.5.8     bigmemory.sri_0.1.3

loaded via a namespace (and not attached):
[1] codetools_0.2-8 iterators_1.0.7 Rcpp_0.11.2