在R中按组的日期标记数据

时间:2020-05-20 04:27:12

标签: r data.table

在每个ID组中,我只想标记那些拥有n年(过去数据) AND 也有未来一年的年份。因此,例如2020年,总会得到0,因为数据中没有2021。

ID <- c(rep("A5", 15), rep("B2", 15))
product <- rep(rep(c("prod1","prod2","prod3", "prod55", "prod4", "prod9", "prod83"),3),2)
# start <- c(rep("01.01.2016", 3), rep("01.01.2015", 3), rep("01.01.2014",3),
#            rep("01.01.2013",3), rep("01.01.2012",3))
start <- rep(c(rep(2016, 3), rep(2017, 3), rep(2018 ,3),
           rep(2019,3), rep(2020,3)),2)
prodID <- rep(c(3,1,2,3,1,2,3,1,2,3,2,1,3,1,2),2)
mydata <- cbind(ID, product[1:15], start, prodID)
mydata <- as.data.table(mydata)

所以结果类似于n=3

    ID     V2 start result
 1: A5  prod1  2016      0
 2: A5  prod2  2016      0
 3: A5  prod3  2016      0
 4: A5 prod55  2017      0
 5: A5  prod4  2017      0
 6: A5  prod9  2017      0
 7: A5 prod83  2018      1
 8: A5  prod1  2018      1
 9: A5  prod2  2018      1
10: A5  prod3  2019      1
11: A5 prod55  2019      1
12: A5  prod4  2019      1
13: A5  prod9  2020      0
14: A5 prod83  2020      0
15: A5  prod1  2020      0
16: B2  prod1  2016      0
17: B2  prod2  2016      0
18: B2  prod3  2016      0
19: B2 prod55  2017      0
20: B2  prod4  2017      0
21: B2  prod9  2017      0
22: B2 prod83  2018      1
23: B2  prod1  2018      1
24: B2  prod2  2018      1
25: B2  prod3  2019      1
26: B2 prod55  2019      1
27: B2  prod4  2019      1
28: B2  prod9  2020      0
29: B2 prod83  2020      0
30: B2  prod1  2020      0

1 个答案:

答案 0 :(得分:1)

我们可以使用between

library(data.table)
n = 3

mydata[, result := +(between(start, min(start) + n - 1, max(start) - 1)), ID]

返回

mydata
#    ID     V2 start result
# 1: A5  prod1  2016      0
# 2: A5  prod2  2016      0
# 3: A5  prod3  2016      0
# 4: A5 prod55  2017      0
# 5: A5  prod4  2017      0
# 6: A5  prod9  2017      0
# 7: A5 prod83  2018      1
# 8: A5  prod1  2018      1
# 9: A5  prod2  2018      1
#10: A5  prod3  2019      1
#11: A5 prod55  2019      1
#12: A5  prod4  2019      1
#13: A5  prod9  2020      0
#14: A5 prod83  2020      0
#15: A5  prod1  2020      0
#16: B2  prod1  2016      0
#17: B2  prod2  2016      0
#18: B2  prod3  2016      0
#19: B2 prod55  2017      0
#20: B2  prod4  2017      0
#21: B2  prod9  2017      0
#22: B2 prod83  2018      1
#23: B2  prod1  2018      1
#24: B2  prod2  2018      1
#25: B2  prod3  2019      1
#26: B2 prod55  2019      1
#27: B2  prod4  2019      1
#28: B2  prod9  2020      0
#29: B2 prod83  2020      0
#30: B2  prod1  2020      0
#    ID     V2 start result

between返回布尔值TRUE / FALSE,指示值是否在两个值之间。等效的方式是:

mydata[, result := +(start >= min(start) + n - 1 & start <= max(start) - 1), ID]

+将布尔值(TRUE / FALSE)转换为整数值(1/0)。

数据

在创建数据时不要使用cbind,请直接使用data.framedata.table

mydata <- data.table(ID, product[1:15], start)
相关问题