Question

如果是这样，为什么我们需要sapply？

x <- list(a=1, b=1)
y <- list(a=1)
JSON <- rep(list(x,y),10000)
microbenchmark(sapply(JSON, function(x) x$a),
               unlist(lapply(JSON, function(x) x$a)),
               sapply(JSON, "[[", "a"),
               unlist(lapply(JSON, "[[", "a"))
               )

Unit: milliseconds
                                  expr      min       lq   median       uq      max neval
         sapply(JSON, function(x) x$a) 25.22623 28.55634 29.71373 31.76492 88.26514   100
 unlist(lapply(JSON, function(x) x$a)) 17.85278 20.25889 21.61575 22.67390 78.54801   100
               sapply(JSON, "[[", "a") 18.85529 20.06115 21.53790 23.42480 38.56610   100
       unlist(lapply(JSON, "[[", "a")) 11.33859 11.69198 12.25329 13.37008 27.81361   100

Answer 1

除了运行lapply之外，sapply运行simplify2array以尝试将输出拟合到数组中。为了确定是否可能，该函数需要检查所有单个输出是否具有相同的长度：这是通过代价为您看到的大部分时间差的代价unique(lapply(..., length))来完成的：

b <- lapply(JSON, "[[", "a")

microbenchmark(lapply(JSON, "[[", "a"),
               unlist(b),
               unique(lapply(b, length)),
               sapply(JSON, "[[", "a"),
               sapply(JSON, "[[", "a", simplify = FALSE),
               unlist(lapply(JSON, "[[", "a"))
)

# Unit: microseconds
#                                       expr       min        lq   median        uq       max neval
#                    lapply(JSON, "[[", "a") 14809.151 15384.358 15774.26 16905.226 24944.863   100
#                                  unlist(b)   920.047  1043.719  1158.62  1223.091  8056.231   100
#                  unique(lapply(b, length)) 10778.065 11060.452 11456.11 12581.414 19717.740   100
#                    sapply(JSON, "[[", "a") 24827.206 25685.535 26656.88 30519.556 93195.751   100
#  sapply(JSON, "[[", "a", simplify = FALSE) 14283.541 14922.780 15526.42 16654.058 26865.022   100
#            unlist(lapply(JSON, "[[", "a")) 15334.026 16133.146 16607.12 18476.182 30080.544   100

Answer 2

正如droopy和Roland所指出的，sapply是lapply的包装函数，旨在方便使用。 sapply使用的simplify2array慢于unlist：

> microbenchmark(unlist(as.list(1:1000)), simplify2array(as.list(1:1000)), times=1000)
Unit: microseconds
                            expr     min       lq  median       uq      max neval
         unlist(as.list(1:1000))  99.734 109.0230 113.912 118.3120 21343.92  1000
 simplify2array(as.list(1:1000)) 892.712 931.0895 947.957 976.3125 22241.52  1000

此外，返回矩阵时，sapply比其他基本函数慢，例如：

a <- list(c(1,2,3,4), c(1,2,3,4), c(1,2,3,4))
microbenchmark(t(do.call(rbind, lapply(a, function(x)x))), sapply(a, function(x)x))
Unit: microseconds
                                        expr    min     lq median     uq     max neval
 t(do.call(rbind, lapply(a, function(x) x))) 29.823 30.801 32.512 33.734  94.845   100
                    sapply(a, function(x) x) 57.201 58.179 59.156 60.134 111.956   100

但特别是在第二种情况下，sapply更容易使用。

为什么`unlist（lapply）`比`sapply`更快？

2 个答案: