合并R中的日期范围

时间:2017-03-06 03:41:27

标签: r

我有这个数据框(称为signal):

         Date Sig
1  2012-03-25  Go
2  2012-04-15 Stop
3  2012-04-22 Stop
4  2012-05-13 Stop
5  2012-05-20 Stop
6  2012-06-24  Go
7  2012-09-23  Go
8  2012-09-30  Go
9  2012-10-14 Stop
10 2012-12-02  Go
11 2012-12-16 Stop

我正在尝试合并/加入日期范围,以便创建类似的东西:

        Start        Stop Sig
1  2012-03-25  2012-04-15 Go
2  2012-04-15  2012-06-24 Stop
3  2012-06-24  2012-10-14 Go
4  2012-10-14  2012-12-02 Stop
5  2012-12-12  2012-12-16 Go

请问任何想法?

2 个答案:

答案 0 :(得分:1)

到目前为止,这个老问题还没有得到正确答案。以下是使用data.table函数的简明rleid()解决方案:

library(data.table)
setDT(signal)[order(Date), .(Start = first(Date)), by = .(rleid(Sig), Sig)][
  , Stop := shift(Start, type = "lead")][
    -.N, !"rleid"]
    Sig      Start       Stop
1:   Go 2012-03-25 2012-04-15
2: Stop 2012-04-15 2012-06-24
3:   Go 2012-06-24 2012-10-14
4: Stop 2012-10-14 2012-12-02
5:   Go 2012-12-02 2012-12-16

解释

setDT()强制signal到班级data.table。然后,signalDate排序,并使用Sigrleid()的{​​{1}}连续条纹进行汇总。挑选每组的第一行。要确定停止日期,新Sig列会向前移动。最后,删除最后一行和Start分组变量。

数据

OP的数据:

rleid

答案 1 :(得分:0)

我想要的方法是对片段进行排序,然后折叠具有相同值且背靠背的片段。

require(data.table)

## generating a (similar ?) data set
df <- data.frame(dates = rep(as.Date('01-01-2010','%m-%d-%Y'),20) + sample(1:100,20), 
             sig = sample(c('stop', 'go'), replace = T, ))

df$sig <- as.character(df$sig)                 

df <- df[order(df$dates),]

### creating the lag variable for date 
df$dates2 <- c(NA,df$dates[1:nrow(df)-1])


### creating the lag variable for sig

df$sig2 <- c(NA,df$sig[1:nrow(df)-1])

## creating a variable that triggers a new segment 
df$grp <- as.numeric(df$sig != df$sig2)
df$grp[1] <- 0

### the cumsum of the trigger is actually the grouping variable 

df$grp2 <- cumsum(df$grp)


## using data table 
 dt <- data.table(df)


 dt2 <- dt[,.(start = min(dates), end = max(dates), sig = sig ), 
       grp]

结果

 grp      start        end  sig
 1:   0 2010-01-05 2010-04-11   go
 2:   0 2010-01-05 2010-04-11   go
 3:   0 2010-01-05 2010-04-11   go
 4:   0 2010-01-05 2010-04-11 stop
 5:   0 2010-01-05 2010-04-11 stop
 6:   0 2010-01-05 2010-04-11   go
 7:   0 2010-01-05 2010-04-11 stop
 8:   0 2010-01-05 2010-04-11   go