计算日期R的日期差异

时间:2014-05-12 08:20:57

标签: r date optimization logic

如何计算ID是否连续出现少于5天?还计算相同ID记录之间的日差。 我真的无法得到这个问题的逻辑,我不知道我可以从什么开始。

(下面给出的样本数据只是一个样本,我的实际数据量很大。因此,需要进行优化。)

示例数据:

 sample<- data.frame(
  id=c("A","B","C","D","A","C","D","A","C","D","A","D","A","C"),
  date=c("1/3/2013","1/3/2013", "1/3/2013","1/3/2013","2/3/2013","2/3/2013",
    "2/3/2013","3/3/2013","3/3/2013",
     "3/3/2013",
     "4/3/2013",
     "4/3/2013",
     "5/3/2013",
    "5/3/2013"
      )
      )

预期产出:

output<- data.frame(
id=c("A","A","A","A","A","B","C","C","C","C","D","D","D","D","D","D","D"),
date=c("1/3/2013",
     "2/3/2013",
     "3/3/2013",
     "4/3/2013",
     "5/3/2013",
     "1/3/2013",
     "1/3/2013",
     "2/3/2013",
     "3/3/2013",
     "5/3/2013",
     "1/3/2013",
     "2/3/2013",
     "3/3/2013",
     "4/3/2013",
     "5/3/2013",
     "6/3/2013",
     "7/3/2013" ),
 num=c(0,1,2,3,4,0,0,1,2,4,0,1,2,3,4,5,6)
)

计算逻辑:

计算日期差异。例如,1/3到2/3是1天的差异所以2/3行,列idu:1。 2/3到3/3是1天的差异所以加1行3/3,列idu:2。 3/3到5/3是2天差异所以加2到idu。第5/3行,列idu:4。 (基于相同的ID)

Date | idu 
1/3  |  0
2/3  |  1
3/3  |  2
5/3  |  4

提前致谢。

1 个答案:

答案 0 :(得分:2)

sample<- data.frame(
  id=c("A","B","C","D","A","C","D","A","C","D","A","D","A","C"),
  date=c("1/3/2013","1/3/2013", "1/3/2013","1/3/2013","2/3/2013","2/3/2013",
         "2/3/2013","3/3/2013","3/3/2013",
         "3/3/2013",
         "4/3/2013",
         "4/3/2013",
         "5/3/2013",
         "5/3/2013"), stringsAsFactors = F)

library(lubridate)
sample$date <- dmy(sample$date)
sample1 <- sample[order(sample$id, sample$date), ]
sample1$idu <- unlist(sapply(rle(sample1$id)$lengths, seq_len)) -1

   id       date idu
1   A 2013-03-01   0
5   A 2013-03-02   1
8   A 2013-03-03   2
11  A 2013-03-04   3
13  A 2013-03-05   4
2   B 2013-03-01   0
3   C 2013-03-01   0
6   C 2013-03-02   1
9   C 2013-03-03   2
14  C 2013-03-05   3
4   D 2013-03-01   0
7   D 2013-03-02   1
10  D 2013-03-03   2
12  D 2013-03-04   3

为了添加时间延迟列,可以使用多个选项。我只是做

sample1$diff <- c(0, int_diff(sample1$date)/days(1))
# Remainder cannot be expressed as fraction of a period.
#   Performing %/%.

> sample1
   id       date idu diff
1   A 2013-03-01   0    0
5   A 2013-03-02   1    1
8   A 2013-03-03   2    1
11  A 2013-03-04   3    1
13  A 2013-03-05   4    1
2   B 2013-03-01   0   -4
3   C 2013-03-01   0    0
6   C 2013-03-02   1    1
9   C 2013-03-03   2    1
14  C 2013-03-05   3    2
4   D 2013-03-01   0   -4
7   D 2013-03-02   1    1
10  D 2013-03-03   2    1
12  D 2013-03-04   3    1

根据需要进行进一步的更改。用0替换所有负值。