使用expand.grid(和dplyr?)优化模拟

时间:2017-08-30 15:29:01

标签: r

我有这个基本功能'follow_up'(比我的真功能简单......),它从线性回归中提取一些信息。这两个参数是个体(n)的数量和个体(rp)的重复次数。

Project name: native

NDK is missing a "platforms" directory.
If you are using NDK, verify the ndk.dir is set to a valid NDK directory.  It is currently set to C:\android-sdks\ndk-bundle.
If you are not using NDK, unset the NDK variable from ANDROID_NDK_HOME or local.properties to remove this warning.


Found properties file: C:\Users\it056548\.signing\HelloWorldApp.properties!

Adding new signingConfig helloWorldApp_release

Available signingConfigs:
------------------
debug
------------------
helloWorldApp_release
------------------
releaseConfig
------------------

Available buildTypes:
------------------
debug
------------------
release
------------------

SigningConfig assigned to the release buildType BEFORE overriding: releaseConfig

SigningConfig assigned to the release buildType AFTER overriding: helloWorldApp_release

:clean
:preBuild UP-TO-DATE
:preReleaseBuild UP-TO-DATE
:checkReleaseManifest
:prepareReleaseDependencies
:compileReleaseAidl
:compileReleaseRenderscript
:generateReleaseBuildConfig
:generateReleaseResValues
:generateReleaseResources
:mergeReleaseResources
:processReleaseManifest
:processReleaseResources
:generateReleaseSources
:incrementalReleaseJavaCompilationSafeguard
:javaPreCompileRelease
:compileReleaseJavaWithJavac
:compileReleaseJavaWithJavac - is not incremental (e.g. outputs have changed, no previous execution, etc.).
:compileReleaseNdk NO-SOURCE
:compileReleaseSources
:lintVitalRelease
:mergeReleaseShaders
:compileReleaseShaders
:generateReleaseAssets
:mergeReleaseAssets
:transformClassesWithDexForRelease
:mergeReleaseJniLibFolders
:transformNativeLibsWithMergeJniLibsForRelease
:processReleaseJavaRes NO-SOURCE
:transformResourcesWithMergeJavaResForRelease
:validateSigningRelease
:packageRelease
:assembleRelease

BUILD SUCCESSFUL

Total time: 8.692 secs

我想实现大模拟,并获得以下内容:

对于n和rp的每个组合(两者都位于expand.grid提供的两个第一列中),我想实现大约1,000次迭代,为每次迭代计算'follow_up'函数,并放入数据框的其他列是'follow_up'返回的三个组件的平均值(即R2和coeffs的平均值)。

因为我的实际函数更复杂,并且因为n和rp具有更高的维度,所以我想优化我的代码(例如,如果可能的话,避免使用rbind或loop)。谢谢你的帮助。

2 个答案:

答案 0 :(得分:1)

你可以:

set.seed(1)
follow_up_vectorized <- Vectorize(follow_up)
sims <- replicate(1e3, follow_up_vectorized(p$n, p$rp))
res <- apply(sims, c(1, 2), mean)

#             [,1]         [,2]        [,3]        [,4]        [,5]         [,6]
# [1,]  4.00783364  3.991355959 4.011558264 3.983996744  3.99937381  4.009033518
# [2,] -0.03425608 -0.004379941 0.005743333 0.005036114 -0.01332833 -0.007702833

但我不会称之为&#34;优化&#34;在不知道实际代码的性能瓶颈的情况下。

修改
根据CPak的请求,输出为新列:

cbind(p, t(res))

#    n rp        1            2
# 1  5  2 4.007834 -0.034256082
# 2 10  2 3.991356 -0.004379941
# 3  5  4 4.011558  0.005743333
# 4 10  4 3.983997  0.005036114
# 5  5  6 3.999374 -0.013328326
# 6 10  6 4.009034 -0.007702833

答案 1 :(得分:1)

您的数据

n=c(5,10)
rp=c(2,4,6)
p=expand.grid(n=n,rp=rp)

更改功能以返回数据框

follow_up_df <- function(n=10,rp=5){
                  donnees <- data.frame(Id=rep(1:n, rep(rp,n)),X=rnorm(n*rp),Y=rnorm(n*rp,4,2))
                  sfit <- summary(lm(Y~X, donnees))
                  output <- c(sfit$R.chisq, sfit$coeff[1], sfit$coeff[2])
                  df <- data.frame(X1=output[1], X2=output[2])
                  return(df)
                }

tidyverse解决方案

CP <- function() {
          require(tidyverse)
          totiter <- 1000

          # Copy p 1000 times
          p1 = p[rep(seq_len(nrow(p)), totiter ), ] %>%                     
                 mutate(ID = seq_len(totiter*nrow(p)))                      # unique ID to join

          # Calculate mean of N iterations
          ans <- map_df(1:nrow(p1), ~follow_up(p1$n[.x], p1$rp[.x])) %>%    # follow_up rowwise
                    mutate(ID = seq_len(totiter*nrow(p))) %>%               # unique ID to join
                    left_join(., p1, by="ID") %>%                           # join with p1
                    group_by(n, rp) %>%     
                    summarise(X1 = mean(X1), X2 = mean(X2)) %>%             # mean per n,rp pair
                    ungroup()
      }

输出

set.seed(1)

      n    rp       X1           X2
1     5     2 4.007834 -0.034256082
2     5     4 4.011558  0.005743333
3     5     6 3.999374 -0.013328326
4    10     2 3.991356 -0.004379941
5    10     4 3.983997  0.005036114
6    10     6 4.009034 -0.007702833

其他解决方案

 Aurele <- function() {
              set.seed(1)
              follow_up_vectorized <- Vectorize(follow_up)
              sims <- replicate(1e3, follow_up_vectorized(p$n, p$rp))
              res <- apply(sims, c(1, 2), mean)
          }    

性能

library(microbenchmark)
microbenchmark(CP(), times=5L)   

    expr      min       lq     mean   median       uq      max neval
    CP() 25.02497 25.58269 25.83376 25.92396 26.26672 26.37044     5
Aurele() 21.31826 21.44110 21.73005 21.79842 21.85301 22.23944     5

结论

Aurele's solution is faster!