来自事件列表的发生率矩阵

时间:2013-09-18 14:11:45

标签: r

我有以下格式的巨大事件列表:

> dput(head(events))
structure(list(action = c("110:0.49,258:0.49", "110:0.49,258:0.49", 
"110:0.49,258:0.49", "114:1.0,299:1.0", "114:1.0,299:1.0", "110:0.49"
), response = c("113=5-110=266-111=30-258=248-99=18-264=15", "113=5-110=278-111=30-258=260-99=18-264=15", 
"113=5-110=284-111=30-258=266-99=18-264=15", "114=34-299=34-108=134-110=12-246=67", 
"114=34-299=34-108=134-110=18-246=67", "114=34-113=6-299=34-108=146-110=24-246=73"
)), .Names = c("action", "response"), row.names = c(NA, 6L), class = "data.frame")

actionresponse都是来自110114等密钥的映射到0.495等值。

我想要的是一个矩阵,其(i,j)条目对所有事件都是sum(action[i] * response[j]),其中action[i]是密钥i的值(类似于{{1} }})。另外,我需要向量responsesum(action[i])

我可以使用这样的东西来做到这一点:

sum(response[j])

我认为这应该或多或少地符合我的需要。

然而,中间对象(# split actions l <- strsplit(events$action,",") ll <- sapply(l,length) l <- unlist(l) l1 <- strsplit(l,":") rm(l) df1 <- data.frame(response = events$response[rep(1:nrow(events), ll)], action = as.factor(sapply(l1,"[[",1)), action.weight = as.numeric(sapply(l1,"[[",2))) # split responses l <- strsplit(df1$response,"-") ll <- sapply(l,length) l <- unlist(l) l1 <- strsplit(l,"=") rm(l) rows <- rep(1:nrow(df1), ll) df2 <- data.frame(action = df1$action[rows], action.weight = df1$action.weight[rows], response = as.factor(sapply(l1,"[[",1)), response.weight = as.numeric(sapply(l1,"[[",2))) df2$weight <- df2$action.weight * df2$response.weight df2$action.weight <- NULL df2$response.weight <- NULL # summarise by action/response dt1 <- as.data.table(df2) setkeyv(dt1,c("action","response")) dt2 <- dt1[, sum(weight), by="action,response"] df1df2和&amp; c)对我的RAM来说太大了。 我想知道是否有办法以更有效的方式完成我的需要。

PS。事实上laction的密钥组是相同的,但似乎没有理由依赖它。

0 个答案:

没有答案