R具有固定列和的随机二进制数据帧

时间:2015-05-18 17:14:53

标签: r loops random binary

我正在尝试构建一个完全由1和0组成的数据帧。它应该是随机构建的,除了每列需要加起来指定的值。

如果这只是一个数据框,我知道如何做到这一点,但它需要内置到一个函数中,在所述函数中它将作为一个迭代过程完成,最高可达1000倍。

2 个答案:

答案 0 :(得分:3)

一种有效的方法是对每列使用适当数量的1和0来混洗一个向量。您可以定义以下函数来生成具有指定行数和每列中1的数量的矩阵:

build.mat <- function(nrow, csums) {
  sapply(csums, function(x) sample(rep(c(0, 1), c(nrow-x, x))))
}
set.seed(144)
build.mat(5, 0:5)
#      [,1] [,2] [,3] [,4] [,5] [,6]
# [1,]    0    0    0    0    1    1
# [2,]    0    0    0    1    0    1
# [3,]    0    0    0    0    1    1
# [4,]    0    1    1    1    1    1
# [5,]    0    0    1    1    1    1

要构建列表,您可以在每个矩阵的所需列总和上使用lapply

cslist <- list(1:3, c(4, 2))
set.seed(144)
lapply(cslist, build.mat, nrow=5)
# [[1]]
#      [,1] [,2] [,3]
# [1,]    0    1    1
# [2,]    0    0    0
# [3,]    0    0    0
# [4,]    0    1    1
# [5,]    1    0    1
# 
# [[2]]
#      [,1] [,2]
# [1,]    0    0
# [2,]    1    0
# [3,]    1    1
# [4,]    1    0
# [5,]    1    1

答案 1 :(得分:2)

如果有更多的零而不是1,反之亦然,@ akrun的方法可能会更快:

build_01_mat <- function(n,n1s){
  nc        <- length(n1s)
  zerofirst <- sum(n1s) < n*nc/2

  tochange  <- if (zerofirst) n1s else n-n1s

  mat       <- matrix(if (zerofirst) 0L else 1L,n,nc)

  mat[cbind(
    unlist(c(sapply((1:nc)[tochange>0],function(col)sample(1:n,tochange[col])))),
    rep(1:nc,tochange)
  )] <- if (zerofirst) 1L else 0L
  mat
}

set.seed(1)
build_01_mat(5,c(1,3,0))
#      [,1] [,2] [,3]
# [1,]    0    0    0
# [2,]    1    1    0
# [3,]    0    1    0
# [4,]    0    1    0
# [5,]    0    0    0

一些基准:

require(rbenchmark)

# similar numbers of zeros and ones
benchmark(
  permute=build.mat(1e7,1e7/2),
  replace=build_01_mat(1e7,1e7/2),replications=10)[1:5]
#      test replications elapsed relative user.self
# 1 permute           10    7.68    1.126      6.59
# 2 replace           10    6.82    1.000      6.27

# many more zeros than ones
benchmark(
  permute=build.mat(1e6,rep(10,20)),
  replace=build_01_mat(1e6,rep(10,20)),replications=10)[1:5]
#      test replications elapsed relative user.self
# 1 permute           10   10.28    3.779      8.51
# 2 replace           10    2.72    1.000      2.23

# many more ones than zeros
benchmark(
  permute=build.mat(1e6,1e6-rep(10,20)),
  replace=build_01_mat(1e6,1e6-rep(10,20)),replications=10)[1:5]
#      test replications elapsed relative user.self
# 1 permute           10   10.94    4.341      9.28
# 2 replace           10    2.52    1.000      2.09