Question

很长一段时间以来，我一直在使用sfLapply来处理很多并行r脚本。然而，最近我已经深入研究并行计算，我一直在使用sfClusterApplyLB，如果单个实例不需要花费相同的时间来运行，那么可以节省大量时间。如果sfLapply将在加载新批处理之前等待批处理的每个实例完成（这可能导致空闲实例），完成任务的sfClusterApplyLB实例将立即分配给列表中的其余元素，因此可能会节省相当多的时间当实例没有花费相同的时间时。这让我质疑为什么我们在使用降雪时不想平衡我们的跑步？到目前为止我唯一发现的是，当并行脚本出现错误时，sfClusterApplyLB仍会在发出错误之前循环遍历整个列表，而sfLapply将在尝试第一批后停止。我还缺少什么？是否存在负载平衡的任何其他成本/缺点？下面是一个示例代码，显示了两个

之间的区别

rm(list = ls()) #remove all past worksheet variables
working_dir="D:/temp/"
setwd(working_dir)
n_spp=16
spp_nmS=paste0("sp_",c(1:n_spp))
spp_nm=spp_nmS[1]
sp_parallel_run=function(sp_nm){
  sink(file(paste0(working_dir,sp_nm,"_log.txt"), open="wt"))#######NEW
  cat('\n', 'Started on ', date(), '\n') 
  ptm0 <- proc.time()
  jnk=round(runif(1)*8000000) #this is just a redundant script that takes an arbitrary amount of time to run
  jnk1=runif(jnk)
  for (i in 1:length(jnk1)){
    jnk1[i]=jnk[i]*runif(1)
  }
  ptm1=proc.time() - ptm0
  jnk=as.numeric(ptm1[3])
  cat('\n','It took ', jnk, "seconds to model", sp_nm)

  #stop sinks
  sink.reset <- function(){
    for(i in seq_len(sink.number())){
      sink(NULL)
    }
  }
  sink.reset()
}
require(snowfall)
cpucores=as.integer(Sys.getenv('NUMBER_OF_PROCESSORS'))

sfInit( parallel=T, cpus=cpucores) # 
sfExportAll() 
system.time((sfLapply(spp_nmS,fun=sp_parallel_run)))
sfRemoveAll()
sfStop()

sfInit( parallel=T, cpus=cpucores) # 
sfExportAll() 
system.time(sfClusterApplyLB(spp_nmS,fun=sp_parallel_run)) 
sfRemoveAll()
sfStop()

Answer 1

sfLapply函数很有用，因为它将输入值拆分为每个可用工作者的一组任务，这是mclapply函数调用预调度的内容。当任务不长时间时，这可以提供比sfClusterApplyLB更好的性能。

这是一个极端的例子，展示了预先安排的好处：

> system.time(sfLapply(1:100000, sqrt))
   user  system elapsed
  0.148   0.004   0.170
> system.time(sfClusterApplyLB(1:100000, sqrt))
   user  system elapsed
 19.317   1.852  21.222

使用降雪进行并行计算时为什么不进行负载平衡？

1 个答案: