Question

我有这个数据集。

df <- data.frame(c("Attribute1", "Attribute1", "Attribute1", "Attribute2", "Attribute2"),
                 c("2018-11-01 00:00:19", "2018-11-01 00:00:54", "2018-11-01 00:01:17",
                   "2018-11-01 00:01:23", "2018-11-01 00:01:25"))
names(df) <- c("Signature", "date")
df$date <- as.POSIXct(df$date)

我想知道如何在过去的1小时内在R编程中计算相同的属性，这就是我想要的结果：

然后Count_Signature将计算过去一小时内“属性1”的数量，依此类推。

谢谢

Answer 1

这是您的解决方案。我使用data.table是因为它具有一些不错的时间功能，并且在按组计算时性能很高。我创建了一个时间索引，根据年，日和天数进行折叠，然后按小时进行分箱。如果您打算根据系统时间对R编程的“最后一小时”进行分组，则需要对此进行修改。在这种情况下，Sys.time()可以成为您的朋友。

无论如何，这是解决方案：

df <- data.frame(c("Attribute1", "Attribute1", "Attribute1", "Attribute2", "Attribute2"),
                 c("2018-11-01 00:00:19", "2018-11-01 00:00:54", "2018-11-01 00:01:17",
                   "2018-11-01 00:01:23", "2018-11-01 00:01:25"))
names(df) <- c("Signature", "date")
df$date <- as.POSIXct(df$date)

library(data.table)
dt <- setDT(df)
dt[, time_idx := paste0(year(date), "-", yday(date), "-", hour(date))]
dt[, Count_Signature := (1L:.N) - 1L, keyby = .(Signature, time_idx)]
dt
#>     Signature                date   time_idx Count_Signature
#> 1: Attribute1 2018-11-01 00:00:19 2018-305-0               0
#> 2: Attribute1 2018-11-01 00:00:54 2018-305-0               1
#> 3: Attribute1 2018-11-01 00:01:17 2018-305-0               2
#> 4: Attribute2 2018-11-01 00:01:23 2018-305-0               0
#> 5: Attribute2 2018-11-01 00:01:25 2018-305-0               1

^{由reprex package（v0.2.1）于2019-01-03创建}

如何计算过去R中相同小时的分类？

1 个答案: