按给定日期段汇总值

时间:2016-12-02 15:06:16

标签: r date sum

我想要的是对属于同一时间范围内的值的各个部分进行求和。在上一个值的 6小时之后出现的任何值,我想要在一个新的段中。我还想计算每个细分中的小时数以及每个细分的最大值和平均值。

以下是示例数据:

Date <- c("1954-10-07", "1954-10-07", "1954-10-07", "1954-10-07", "1954-10-07", "1954-10-07", "1954-10-11", "1954-10-11", "1954-10-11", "1954-10-12", "1954-10-13")
Time <- c("0:00", "1:00", "4:00", "13:00", "14:00", "15:00", "9:00","10:00", "11:00", "23:00", "0:00")
DateTime <- paste(Date, Time)
Value <- c(0.1, 0.2, 0.1, 0.02, 0.2, 1.1, 0.2, 0.3, 0.4, 0.1, 0.05)
df <- data.frame(Date, Time, DateTime, Value)

df
Date       Time     DateTime      Value
1954-10-07  0:00  1954-10-07 0:00  0.10
1954-10-07  1:00  1954-10-07 1:00  0.20
1954-10-07  4:00  1954-10-07 4:00  0.10
1954-10-07 13:00 1954-10-07 13:00  0.02
1954-10-07 14:00 1954-10-07 14:00  0.20
1954-10-07 15:00  1954-10-07 15:00  1.10
1954-10-11  9:00  1954-10-11 9:00  0.20
1954-10-11 10:00 1954-10-11 10:00  0.30
1954-10-11 11:00 1954-10-11 11:00  0.40
1954-10-12 23:00 1954-10-12 23:00  0.10
1954-10-13  0:00  1954-10-13 0:00  0.05

期望的输出:

IntervalStart      IntervalEnd    ValueSum  ValueMax  ValueMedian  HoursinSegment
1954-10-07 0:00  1954-10-07 4:00    0.4       0.2        0.1           4
1954-10-07 13:00 1954-10-07 14:00   1.32      1.10       0.2           3
1954-10-11 9:00  1954-10-11 10:00   0.5       0.30       0.25          1
1954-10-12 23:00 1954-10-13 0:00    0.15      0.1        0.75          1

我认为我在时间戳中的诀窍是,因为某些值会在第二天出现,但仍然在之前值的6小时内。谢谢你的帮助!

1 个答案:

答案 0 :(得分:2)

我认为这可以满足您的需求:

library(data.table)
setDT(df)[,DateTime := as.POSIXct(sprintf("%s:00", DateTime))]

df[, Grp := cumsum(c(0, difftime(DateTime[-1], head(DateTime, -1), units = "h")) > 6)]

df[,.(Start = min(DateTime),
      End = max(DateTime),
      Min = min(Value),
      Max = max(Value),
      Median = median(Value),
      Span = difftime(max(DateTime), min(DateTime), "h")),
   by = "Grp"]
#    Grp               Start                 End  Min Max Median    Span
# 1:   0 1954-10-07 00:00:00 1954-10-07 04:00:00 0.10 0.2  0.100 4 hours
# 2:   1 1954-10-07 13:00:00 1954-10-07 15:00:00 0.02 1.1  0.200 2 hours
# 3:   2 1954-10-11 09:00:00 1954-10-11 11:00:00 0.20 0.4  0.300 2 hours
# 4:   3 1954-10-12 23:00:00 1954-10-13 00:00:00 0.05 0.1  0.075 1 hours 
  • setDT(df)[,DateTime := as.POSIXct(...df转换为data.table,并将DateTime列转换为POSIXct
  • df[, Grp := cumsum(c(0, difftime(...根据您上述情况创建分组ID,即当DateTime[i] - DateTime[i - 1]大于6小时时,新分组开始
  • df[,.(Start = min(DateTime), ...计算每个Grp
  • 的汇总