在汇总时如何在行之间匹配值?
我有此数据:
library(data.table)
dat<-data.table(group=rep(1,7),code=c("A11",rep("A12",3),"A10","A9","A8"),
in.out=c(rep("In",4),rep("Out",3)),type=c("car","train","car",rep("train",3),"car"))
group code in.out type
1 A11 In car
1 A12 In train
1 A12 In car
1 A12 In train
1 A10 Out train
1 A9 Out train
1 A8 Out car
我想在每次观察的每个代码级别将in.out =='Out'的类型与in.out =='In'的类型匹配。
例如,我们看到对于代码为A8的观察,类型(汽车)与代码A11的类型匹配。另一方面,对于代码A10,类型(火车)与A11不匹配。理想情况下,我需要创建一个匹配标志(0,1)的列表, 像这样:
group code in.out type match
1 A11 In car
1 A12 In train
1 A12 In car
1 A12 In train
1 A10 Out train 0,1
1 A9 Out train 0,1
1 A8 Out car 1,1
我一直在尝试类似的东西:
dat[ , match := +(type[in.out=='Out'] %in% type[in.out=='In']),by=.(code)]
但是结果不是很正确。我想念什么?
答案 0 :(得分:0)
OP询问了如何在汇总时在行之间匹配值?
一般的答案是通过加入和随后的聚合。
如果我理解正确,那么OP希望在"Out"
行相同的"In"
行和type
行之间找到 all 个匹配项。然后"In"
行的代码级别被连续编号,并检查是否找到匹配的级别。
# create numeric observation levels
dat[, obslvl := as.integer(stringr::str_replace(code, "A", ""))]
# order rows for convenience (not required but helps to understand)
setorder(dat, group, lvl)
# store "Out" rows
dt_out <- dat[in.out == "Out"]
# store "In" rows in separate data.table and number levels contiguously
dt_in <- dat[in.out == "In"][, lvl.rank := frank(lvl, ties.method = "dense"), by = group]
group code in.out type lvl lvl.rank 1: 1 A11 In car 11 1 2: 1 A12 In train 12 2 3: 1 A12 In car 12 2 4: 1 A12 In train 12 2
现在,我们可以在联接时同时联接两个子表和集合:
tmp <- dt_in[dt_out, on = .(group, type), by = .EACHI,
toString(as.integer(sort(lvl.rank) == seq_len(.N)))]
group type V1 1: 1 car 1, 1 2: 1 train 0, 1 3: 1 train 0, 1
V1
包含是否在第一个"In"
级别,第二个"In"
级别等等找到匹配项的标志。结果用于更新dt_out
:
dt_out[, match := tmp$V1][]
group code in.out type lvl match 1: 1 A8 Out car 8 1, 1 2: 1 A9 Out train 9 0, 1 3: 1 A10 Out train 10 0, 1
最后,根据要求将结果与完整数据集dat
结合在一起:
dt_out[dat, on = .(group, code, in.out, type, lvl)]
group code in.out type lvl match 1: 1 A8 Out car 8 1, 1 2: 1 A9 Out train 9 0, 1 3: 1 A10 Out train 10 0, 1 4: 1 A11 In car 11 <NA> 5: 1 A12 In train 12 <NA> 6: 1 A12 In car 12 <NA> 7: 1 A12 In train 12 <NA>
有一个快捷方式版本,它仅返回匹配的"In"
级别而不创建标志。也许,这有助于更好地理解其内涵:
dt_in <- dat[in.out == "In"]
dt_out <- dat[in.out == "Out"]
dt_out[, matches := dt_in[dt_out, on = .(group, type), by = .EACHI, toString(x.code)]$V1]
dt_out[dat, on = .(group, code, in.out, type)]
group code in.out type matches
1: 1 A11 In car <NA>
2: 1 A12 In train <NA>
3: 1 A12 In car <NA>
4: 1 A12 In train <NA>
5: 1 A10 Out train A12, A12
6: 1 A9 Out train A12, A12
7: 1 A8 Out car A11, A12