R中更快的替代for循环吗?

时间:2020-07-24 17:47:12

标签: r loops

我正在尝试计算转换概率(没有马尔可夫假设),这需要计算此嵌套积分nested integral

请注意,在我的情况下,积分可以替换为求和。这是我用来计算这个的玩具示例代码,

# simulate some data
set.seed(99)
data<-data.frame(time=seq(0,7,0.1),
             S_D=seq(1,0.95,length.out = 71),
             lam12=sample(c(0,0.1,0.12,0.15,0.17),size = 71,replace = TRUE),
             lam23=sample(c(0,0.05,0.1,0.08,0.12),size = 71,replace = TRUE),
             lam24=sample(c(0,0.02,0.05,0.06,0.08),size = 71,replace = TRUE))

prob_123<-c() # initializing a NULL vector
end<-nrow(data)

for (j in 2: end)
{
  # j indicates u in the expresstion
  # k indicates v in the expression
  prob_123k<-0
  for (k in (j+1):end)
  {
    if (k==(j+1)){  
      prob_123k<-prob_123k+data$S_D[j-1]*data$lam12[j]*data$lam12[k-j]
    }
    if (k>(j+1)){ 
  
      prob_123k<-prob_123k+data$S_D[j-1]*data$lam12[j]*prod(1-(data$lam12[1:(k-j-1)]+data$lam24[1:(k-j-1)]))*(data$lam12[k-j])
   }
  }
  prob_123[j-1]<-prob_123k
}

sum(prob_123)  # result = 5.631623

在代码中,S_D对应于表达式exp{-(\Lambda_12(u)+\Lambda_13(u)+\Lambda_14(u))}prod(1-(...))对应于表达式exp{-(\Lambda_23(v-u)+\Lambda_24(v-u))}。我的原始数据集比这个大得多,并且需要很长时间才能计算出嵌套的for循环。任何人都可以提出更快的替代方法吗?非常感谢。

1 个答案:

答案 0 :(得分:0)

您可以结合使用cumprodsum来更快地循环。

将for循环封装到函数f中:

f <- function(data) {
  end <- nrow(data)
  prob_123 <- vector("numeric", end) # initializing a NULL vector
  for (j in 2:end) {
    # j indicates u in the expresstion
    # k indicates v in the expression
    prob_123k <- 0
    for (k in (j + 1):end) {
      
      if (k == (j + 1)) {  
        prob_123k <- prob_123k + data$S_D[j - 1] * data$lam12[j] * data$lam12[k - j]
      }
      if (k > (j + 1)) { 
        prob_123k <- prob_123k + data$S_D[j - 1] * data$lam12[j] * data$lam12[k - j] *
          prod(1 - (data$lam12[1:(k - j - 1)] + data$lam24[1:(k - j - 1)]))
      }
    }
    prob_123[j - 1] <- prob_123k
  }
  
  sum(prob_123)    
}

f(data)
5.631623

我们注意到正在运行的产品进行了不必要的重新计算-可以计算一次,然后对其进行适当索引。重写该函数,如下所示:

ff <- function(data) {
  end <- nrow(data)
  prob_123 <- vector("numeric", end) # initializing a NULL vector
  p <- cumprod(1 - (data$lam12 + data$lam24))
  for (j in 2:end) {
    # j indicates u in the expresstion
    # k indicates v in the expression
    if (j + 1 <= end) {
      prob_123k <- data$S_D[j - 1] * data$lam12[j] * data$lam12[(j + 1):end - j] * 
        c(p[1], p[1:(end - j - 1)])
    }
    prob_123[j - 1] <- sum(prob_123k) + # Equivalent to the k > j + 1 part
      data$S_D[j - 1] * data$lam12[j] * data$lam12[j + 1 - j] # Equivalent to k = j + 1 part
  }
  
  sum(prob_123)    
}

identical(f(data), ff(data))
TRUE

然后我们可以使用microbenchmark包来查看是否有改进:

library(microbenchmark)
microbenchmark(f(data), ff(data))
Unit: microseconds
     expr      min       lq     mean   median        uq       max neval
  f(data) 8853.501 9020.902 10118.14 9410.351 10416.201 14392.401   100
 ff(data)  344.701  356.401   373.86  367.301   384.701   466.401   100

函数ff平均快30倍。

我确定代码骑师也许可以通过消除j for循环来进一步优化此功能?

相关问题