如何计算ID是否连续出现少于5天?还计算相同ID记录之间的日差。 我真的无法得到这个问题的逻辑,我不知道我可以从什么开始。
(下面给出的样本数据只是一个样本,我的实际数据量很大。因此,需要进行优化。)
示例数据:
sample<- data.frame(
id=c("A","B","C","D","A","C","D","A","C","D","A","D","A","C"),
date=c("1/3/2013","1/3/2013", "1/3/2013","1/3/2013","2/3/2013","2/3/2013",
"2/3/2013","3/3/2013","3/3/2013",
"3/3/2013",
"4/3/2013",
"4/3/2013",
"5/3/2013",
"5/3/2013"
)
)
预期产出:
output<- data.frame(
id=c("A","A","A","A","A","B","C","C","C","C","D","D","D","D","D","D","D"),
date=c("1/3/2013",
"2/3/2013",
"3/3/2013",
"4/3/2013",
"5/3/2013",
"1/3/2013",
"1/3/2013",
"2/3/2013",
"3/3/2013",
"5/3/2013",
"1/3/2013",
"2/3/2013",
"3/3/2013",
"4/3/2013",
"5/3/2013",
"6/3/2013",
"7/3/2013" ),
num=c(0,1,2,3,4,0,0,1,2,4,0,1,2,3,4,5,6)
)
计算逻辑:
计算日期差异。例如,1/3到2/3是1天的差异所以2/3行,列idu:1。 2/3到3/3是1天的差异所以加1行3/3,列idu:2。 3/3到5/3是2天差异所以加2到idu。第5/3行,列idu:4。 (基于相同的ID)
Date | idu
1/3 | 0
2/3 | 1
3/3 | 2
5/3 | 4
提前致谢。
答案 0 :(得分:2)
sample<- data.frame(
id=c("A","B","C","D","A","C","D","A","C","D","A","D","A","C"),
date=c("1/3/2013","1/3/2013", "1/3/2013","1/3/2013","2/3/2013","2/3/2013",
"2/3/2013","3/3/2013","3/3/2013",
"3/3/2013",
"4/3/2013",
"4/3/2013",
"5/3/2013",
"5/3/2013"), stringsAsFactors = F)
library(lubridate)
sample$date <- dmy(sample$date)
sample1 <- sample[order(sample$id, sample$date), ]
sample1$idu <- unlist(sapply(rle(sample1$id)$lengths, seq_len)) -1
id date idu
1 A 2013-03-01 0
5 A 2013-03-02 1
8 A 2013-03-03 2
11 A 2013-03-04 3
13 A 2013-03-05 4
2 B 2013-03-01 0
3 C 2013-03-01 0
6 C 2013-03-02 1
9 C 2013-03-03 2
14 C 2013-03-05 3
4 D 2013-03-01 0
7 D 2013-03-02 1
10 D 2013-03-03 2
12 D 2013-03-04 3
为了添加时间延迟列,可以使用多个选项。我只是做
sample1$diff <- c(0, int_diff(sample1$date)/days(1))
# Remainder cannot be expressed as fraction of a period.
# Performing %/%.
> sample1
id date idu diff
1 A 2013-03-01 0 0
5 A 2013-03-02 1 1
8 A 2013-03-03 2 1
11 A 2013-03-04 3 1
13 A 2013-03-05 4 1
2 B 2013-03-01 0 -4
3 C 2013-03-01 0 0
6 C 2013-03-02 1 1
9 C 2013-03-03 2 1
14 C 2013-03-05 3 2
4 D 2013-03-01 0 -4
7 D 2013-03-02 1 1
10 D 2013-03-03 2 1
12 D 2013-03-04 3 1
根据需要进行进一步的更改。用0替换所有负值。