在R中减少嵌套for循环到单循环

时间:2017-08-31 15:53:58

标签: r performance loops

这种嵌套for循环可能需要相当长的时间才能运行,具体取决于对规格,烫发和K的输入。' pop'只是一个存储所有值的数组。 Perms是一个很大的值,比如10,000。

K <- 1 

N <- 100 

Hstar <- 10 

perms <- 10000 

specs <- 1:N 

pop <- array(dim = c(c(perms, N), K))

haps <- as.character(1:Hstar)

probs <- rep(1/Hstar, Hstar) 

for(j in 1:perms){
    for(i in 1:K){ 
        if(i == 1){
            pop[j, specs, i] <- sample(haps, size = N, replace = TRUE, prob = probs)
    }
        else{
            pop[j ,, 1] <- sample(haps[s1], size = N, replace = TRUE, prob = probs[s1])
            pop[j ,, 2] <- sample(haps[s2], size = N, replace = TRUE, prob = probs[s2])

        }
    }
}

HAC.mat <- array(dim = c(c(perms, N), K))

for(k in specs){
    for(j in 1:perms){
        for(i in 1:K){ 
            ind.index <- sample(specs, size = k, replace = FALSE) 
            hap.plot <- pop[sample(1:nrow(pop), size = 1, replace = TRUE), ind.index, sample(1:K, size = 1, replace = TRUE)] 
            HAC.mat[j, k, i] <- length(unique(hap.plot))  
       }
   }
}


means <- apply(HAC.mat, MARGIN = 2, mean)
lower <- apply(HAC.mat, MARGIN = 2, function(x) quantile(x, 0.025))
upper <- apply(HAC.mat, MARGIN = 2, function(x) quantile(x, 0.975))

par(mfrow = c(1, 2))

plot(specs, means, type = "n", xlab = "Specimens sampled", ylab = "Unique haplotypes", ylim = c(1, Hstar))
polygon(x = c(specs, rev(specs)), y = c(lower, rev(upper)), col = "gray")
lines(specs, means, lwd = 2)
HAC.bar <- barplot(N*probs, xlab = "Unique haplotypes", ylab = "Specimens sampled", names.arg = 1:Hstar)

为了让循环运行得更快,我想将上面的循环压缩成一个循环并且具有单个索引(i)从1:(specs * perms)运行并使用带有floor和ceiling函数的模运算来获得完成的工作。我不太确定如何最好地实现这一点。

1 个答案:

答案 0 :(得分:0)

让我们使用RcppArmadillo。 但首先,我需要为您的代码更改两件事:

  • 使用pop作为整数数组而不是字符更容易(也更快)。使用uniquematch
  • 可以轻松制作对应表
  • 我需要置换pop的前两个维度,以便访问更加连续。

生成pop的新代码:

K <- 1 
N <- 100 
Hstar <- 10 
perms <- 10000
specs <- 1:N 
pop <- array(dim = c(N, perms, K))
haps <- 1:Hstar
probs <- rep(1/Hstar, Hstar) 

for(j in 1:perms){
  for(i in 1:K){ 
    if(i == 1){
      pop[, j, i] <- sample(haps, size = N, replace = TRUE, prob = probs)
    }
    else{
      pop[, j, 1] <- sample(haps[s1], size = N, replace = TRUE, prob = probs[s1])
      pop[, j, 2] <- sample(haps[s2], size = N, replace = TRUE, prob = probs[s2])
    }
  }
}

生成HAC.mat的RcppArmadillo代码:

// [[Rcpp::depends(RcppArmadillo)]]
#define ARMA_DONT_PRINT_OPENMP_WARNING
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
#include <set>
using namespace Rcpp;


int sample_one(int n) {
  return n * unif_rand();
} 

int sample_n_distinct(const IntegerVector& x, 
                      int k,
                      const int * pop_ptr) {

  IntegerVector ind_index = RcppArmadillo::sample(x, k, false); 
  std::set<int> distinct_container;

  for (int i = 0; i < k; i++) {
    distinct_container.insert(pop_ptr[ind_index[i]]);
  }

  return distinct_container.size();
}

// [[Rcpp::export]]
arma::Cube<int> fillCube(const arma::Cube<int>& pop,
                         const IntegerVector& specs,
                         int perms,
                         int K) {

  int N = specs.size();
  arma::Cube<int> res(perms, N, K);

  IntegerVector specs_C = specs - 1;
  const int * pop_ptr;
  int i, j, k;

  for (i = 0; i < K; i++) {
    for (k = 0; k < N; k++) {
      for (j = 0; j < perms; j++) {
        pop_ptr = &(pop(0, sample_one(perms), sample_one(K)));
        res(j, k, i) = sample_n_distinct(specs_C, k + 1, pop_ptr);
      }
    }
  }

  return res;
}

在R:

Rcpp::sourceCpp('cube-sample.cpp')
HAC.mat <- fillCube(pop, specs, perms, K)

这是我电脑上版本的10倍。