使用R,data.table,条件求和列

时间:2016-03-31 16:16:51

标签: r data.table

我有一个类似于此的数据表(除了它有150列和大约500万行):

set.seed(1)
dt <- data.table(ID=1:10, Status=c(rep("OUT",2),rep("IN",2),"ON",rep("OUT",2),rep("IN",2),"ON"), 
             t1=round(rnorm(10),1), t2=round(rnorm(10),1), t3=round(rnorm(10),1), 
             t4=round(rnorm(10),1), t5=round(rnorm(10),1), t6=round(rnorm(10),1),
             t7=round(rnorm(10),1),t8=round(rnorm(10),1))

输出:

    ID Status   t1   t2   t3   t4   t5   t6   t7   t8
 1:  1    OUT -0.6  1.5  0.9  1.4 -0.2  0.4  2.4  0.5
 2:  2    OUT  0.2  0.4  0.8 -0.1 -0.3 -0.6  0.0 -0.7
 3:  3     IN -0.8 -0.6  0.1  0.4  0.7  0.3  0.7  0.6
 4:  4     IN  1.6 -2.2 -2.0 -0.1  0.6 -1.1  0.0 -0.9
 5:  5     ON  0.3  1.1  0.6 -1.4 -0.7  1.4 -0.7 -1.3
 6:  6    OUT -0.8  0.0 -0.1 -0.4 -0.7  2.0  0.2  0.3
 7:  7    OUT  0.5  0.0 -0.2 -0.4  0.4 -0.4 -1.8 -0.4
 8:  8     IN  0.7  0.9 -1.5 -0.1  0.8 -1.0  1.5  0.0
 9:  9     IN  0.6  0.8 -0.5  1.1 -0.1  0.6  0.2  0.1
10: 10     ON -0.3  0.6  0.4  0.8  0.9 -0.1  2.2 -0.6

使用data.table,我想添加一个名为Total的新列(使用:=),其中包含以下内容:

对于每一行,

如果Status = OUT,则汇总列t1:t4和t8

如果Status = IN,则汇总列t5,t6,t8

如果Status = ON,则汇总列t1:t3和t6:t8

最终输出应如下所示:

    ID Status   t1   t2   t3   t4   t5   t6   t7   t8  Total
 1:  1    OUT -0.6  1.5  0.9  1.4 -0.2  0.4  2.4  0.5   3.7
 2:  2    OUT  0.2  0.4  0.8 -0.1 -0.3 -0.6  0.0 -0.7   0.6
 3:  3     IN -0.8 -0.6  0.1  0.4  0.7  0.3  0.7  0.6   1.6
 4:  4     IN  1.6 -2.2 -2.0 -0.1  0.6 -1.1  0.0 -0.9  -1.4
 5:  5     ON  0.3  1.1  0.6 -1.4 -0.7  1.4 -0.7 -1.3   1.4
 6:  6    OUT -0.8  0.0 -0.1 -0.4 -0.7  2.0  0.2  0.3  -1.0
 7:  7    OUT  0.5  0.0 -0.2 -0.4  0.4 -0.4 -1.8 -0.4  -0.5
 8:  8     IN  0.7  0.9 -1.5 -0.1  0.8 -1.0  1.5  0.0  -0.2
 9:  9     IN  0.6  0.8 -0.5  1.1 -0.1  0.6  0.2  0.1   0.6
10: 10     ON -0.3  0.6  0.4  0.8  0.9 -0.1  2.2 -0.6   2.2

我对data.table(目前使用的是版本1.9.6)相当新,并且想尝试使用高效的data.table语法来解决方案。

1 个答案:

答案 0 :(得分:5)

我认为按照评论的建议逐一进行,完全没问题,但您也可以创建一个查找表:

cond = data.table(Status = c("OUT", "IN", "ON"),
                  cols = Map(paste0, 't', list(c(1:4, 8), c(5,6,8), c(1:3, 6:8))))
#   Status              cols
#1:    OUT    t1,t2,t3,t4,t8
#2:     IN          t5,t6,t8
#3:     ON t1,t2,t3,t6,t7,t8

dt[cond, Total := Reduce(`+`, .SD[, cols[[1]], with = F]), on = 'Status', by = .EACHI]
#    ID Status   t1   t2   t3   t4   t5   t6   t7   t8 Total
# 1:  1    OUT -0.6  1.5  0.9  1.4 -0.2  0.4  2.4  0.5   3.7
# 2:  2    OUT  0.2  0.4  0.8 -0.1 -0.3 -0.6  0.0 -0.7   0.6
# 3:  3     IN -0.8 -0.6  0.1  0.4  0.7  0.3  0.7  0.6   1.6
# 4:  4     IN  1.6 -2.2 -2.0 -0.1  0.6 -1.1  0.0 -0.9  -1.4
# 5:  5     ON  0.3  1.1  0.6 -1.4 -0.7  1.4 -0.7 -1.3   1.4
# 6:  6    OUT -0.8  0.0 -0.1 -0.4 -0.7  2.0  0.2  0.3  -1.0
# 7:  7    OUT  0.5  0.0 -0.2 -0.4  0.4 -0.4 -1.8 -0.4  -0.5
# 8:  8     IN  0.7  0.9 -1.5 -0.1  0.8 -1.0  1.5  0.0  -0.2
# 9:  9     IN  0.6  0.8 -0.5  1.1 -0.1  0.6  0.2  0.1   0.6
#10: 10     ON -0.3  0.6  0.4  0.8  0.9 -0.1  2.2 -0.6   2.2
相关问题