某些sql数据库support with cube
操作符group by
修饰符。我没有此功能。
基本上,如果我有一个像:
这样的数据集+------+-----------+---------+---------+
| sum | source_id | type_id | variety |
+------+-----------+---------+---------+
| 491 | 1 | 1 | 1 |
| 2008 | 1 | 2 | 1 |
| 33 | 1 | 3 | 1 |
| 483 | 1 | 4 | 1 |
| 482 | 1 | 5 | 1 |
| 343 | 1 | 6 | 1 |
| 4979 | 4 | 5 | 1 |
| 303 | 5 | 1 | 1 |
| 443 | 5 | 1 | 2 |
| 1295 | 5 | 2 | 1 |
...
我想将其导入到r中的数据框中,并为(source_id,type_id和variety)的所有子排列生成组合和。所以,其中source_id = 1,其中source_id = 1,type_id = 1,其中source_id = 1且品种= 1,其中type_id = 1且品种= 1,其中type_id = 1,其中source_id = 2,等等
我怎样才能最好地完成这项工作?
答案 0 :(得分:4)
您可以使用ddply,并输入一个包含可能组合的列表,如下所示:
facs <- c("source_id","type_id","variety")
combs <- unlist(
mapply(function(j)combn(facs,j,simplify=F),1:3)
,recursive=F)
require(plyr)
datlist <- mapply(function(j)ddply(Data,j,summarize,sum(Sum)),combs)
require(reshape)
rbind.fill(datlist)
经过测试:
Data <- data.frame(
Sum=rpois(10,5),
source_id=rep(1:2,each=5),
type_id=rep(1:5,each=2),
variety=rep(1:2,5)
)
答案 1 :(得分:2)
这应该这样做
# generate dummy data
df = data.frame(
Sum = rnorm(10),
source_id = sample(10, 5, replace = T),
type_id = sample(10, 5, replace = T),
variety = sample(10, 5, replace = T)
)
index = names(df)[-1]
temp = expand.grid(0:1, 0:1, 0:1)[-1,]
require(plyr)
cubedf = adply(temp, 1, function(x)
ddply(df, index[x == 1], summarize, SUM = sum(Sum)))
编辑:替代解决方案(使用从Joris借来的代码)
library(plyr)
# list factor variables
index = names(df)[-1]
# generate all combinations of factor variables
combs = unlist(llply(1:3, combn, x = index, simplify = F), recursive = F)
# calculate sum for each combination
cubedf = ldply(combs, function(var)
ddply(df, var, summarize, SUM = sum(Sum)))
答案 2 :(得分:1)
Joris的答案是对的。但我必须承认,乍一看对我来说并不直观。在阅读他的答案之前,我会用多个ddply()
步骤解决这个问题。像这样:
Data <- data.frame(
Sum=rpois(10,5),
source_id=rep(1:2,each=5),
type_id=rep(1:5,each=2),
variety=rep(1:2,5)
)
require(plyr)
myStuff1 <- ddply(Data, c("source_id" ), function(df) sum(df$Sum) )
myStuff2 <- ddply(Data, c("source_id", "type_id" ), function(df) sum(df$Sum) )
myStuff3 <- ddply(Data, c("source_id", "type_id", "variety"), function(df) sum(df$Sum) )