我正在尝试计算转换概率(没有马尔可夫假设),这需要计算此嵌套积分
请注意,在我的情况下,积分可以替换为求和。这是我用来计算这个的玩具示例代码,
# simulate some data
set.seed(99)
data<-data.frame(time=seq(0,7,0.1),
S_D=seq(1,0.95,length.out = 71),
lam12=sample(c(0,0.1,0.12,0.15,0.17),size = 71,replace = TRUE),
lam23=sample(c(0,0.05,0.1,0.08,0.12),size = 71,replace = TRUE),
lam24=sample(c(0,0.02,0.05,0.06,0.08),size = 71,replace = TRUE))
prob_123<-c() # initializing a NULL vector
end<-nrow(data)
for (j in 2: end)
{
# j indicates u in the expresstion
# k indicates v in the expression
prob_123k<-0
for (k in (j+1):end)
{
if (k==(j+1)){
prob_123k<-prob_123k+data$S_D[j-1]*data$lam12[j]*data$lam12[k-j]
}
if (k>(j+1)){
prob_123k<-prob_123k+data$S_D[j-1]*data$lam12[j]*prod(1-(data$lam12[1:(k-j-1)]+data$lam24[1:(k-j-1)]))*(data$lam12[k-j])
}
}
prob_123[j-1]<-prob_123k
}
sum(prob_123) # result = 5.631623
在代码中,S_D
对应于表达式exp{-(\Lambda_12(u)+\Lambda_13(u)+\Lambda_14(u))}
,prod(1-(...))
对应于表达式exp{-(\Lambda_23(v-u)+\Lambda_24(v-u))}
。我的原始数据集比这个大得多,并且需要很长时间才能计算出嵌套的for循环。任何人都可以提出更快的替代方法吗?非常感谢。
答案 0 :(得分:0)
您可以结合使用cumprod
和sum
来更快地循环。
将for循环封装到函数f
中:
f <- function(data) {
end <- nrow(data)
prob_123 <- vector("numeric", end) # initializing a NULL vector
for (j in 2:end) {
# j indicates u in the expresstion
# k indicates v in the expression
prob_123k <- 0
for (k in (j + 1):end) {
if (k == (j + 1)) {
prob_123k <- prob_123k + data$S_D[j - 1] * data$lam12[j] * data$lam12[k - j]
}
if (k > (j + 1)) {
prob_123k <- prob_123k + data$S_D[j - 1] * data$lam12[j] * data$lam12[k - j] *
prod(1 - (data$lam12[1:(k - j - 1)] + data$lam24[1:(k - j - 1)]))
}
}
prob_123[j - 1] <- prob_123k
}
sum(prob_123)
}
f(data)
5.631623
我们注意到正在运行的产品进行了不必要的重新计算-可以计算一次,然后对其进行适当索引。重写该函数,如下所示:
ff <- function(data) {
end <- nrow(data)
prob_123 <- vector("numeric", end) # initializing a NULL vector
p <- cumprod(1 - (data$lam12 + data$lam24))
for (j in 2:end) {
# j indicates u in the expresstion
# k indicates v in the expression
if (j + 1 <= end) {
prob_123k <- data$S_D[j - 1] * data$lam12[j] * data$lam12[(j + 1):end - j] *
c(p[1], p[1:(end - j - 1)])
}
prob_123[j - 1] <- sum(prob_123k) + # Equivalent to the k > j + 1 part
data$S_D[j - 1] * data$lam12[j] * data$lam12[j + 1 - j] # Equivalent to k = j + 1 part
}
sum(prob_123)
}
identical(f(data), ff(data))
TRUE
然后我们可以使用microbenchmark
包来查看是否有改进:
library(microbenchmark)
microbenchmark(f(data), ff(data))
Unit: microseconds
expr min lq mean median uq max neval
f(data) 8853.501 9020.902 10118.14 9410.351 10416.201 14392.401 100
ff(data) 344.701 356.401 373.86 367.301 384.701 466.401 100
函数ff
平均快30倍。
我确定代码骑师也许可以通过消除j
for循环来进一步优化此功能?