按多个变量分组并汇总dplyr

时间:2019-02-17 16:54:45

标签: r dplyr

我正在尝试每30秒对每个传感器平均CO2浓度数据:

    head(df)
# A tibble: 6 x 7
# Groups: BinnedTime [1]

  Sensor Date       Time   calCO2 DeviceTime          cuts   BinnedTime         
  <fctr> <date>     <time>  <dbl> <dttm>              <fctr> <chr>              
1 N1     2019-02-12 13:24     400 2019-02-12 13:24:02 (0,10] 2019-02-12 13:24:02
2 N1     2019-02-12 13:24     400 2019-02-12 13:24:02 (0,10] 2019-02-12 13:24:02
3 N1     2019-02-12 13:24     400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:03
4 N2     2019-02-12 13:24     400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:02
5 N3     2019-02-12 13:24     400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:02
6 N3     2019-02-12 13:24     400 2019-02-12 13:24:05 (0,10] 2019-02-12 13:24:04

我使用:

df %>%
  group_by(Sensor)%>%
  group_by(BinnedTime = cut(DeviceTime, breaks="30 sec")) %>%
  summarize(Concentration = mean(calCO2))

但是它不会首先按Sensor分组,而是会忽略它们,而是计算BinnedTime的平均值。任何想法都将受到欢迎。

我已经读过.dots=c("Sensor","BinnedTime")了,但这行不通。

请注意,我尚未创建虚拟数据,因此您可以确切地看到我的情况,因为时间和日期似乎有些微妙,我无法完全理解。

1 个答案:

答案 0 :(得分:1)

So to summarize the comments by @kath with some improvements to address your follow-on question:

df %>%
    group_by(Sensor, BinnedTime = cut(DeviceTime, breaks="30 sec")) %>%
        mutate(Concentration = mean(calCO2)) %>%
    ungroup()

The above will maintain all columns, but duplicate the Concentration calculation for each row of the df. An alternative that would allow you to both roll up and retain more columns of interest is to simply add them to the summarize operation, as illustrated below.

    df %>%
    group_by(Sensor, BinnedTime = cut(DeviceTime, breaks="30 sec")) %>%
        summarize(Concentration = mean(calCO2),
                   Date = min(Date),
                   Time = min(Time),
                   StartDeviceTime = min(DeviceTime),
                   EndDeviceTime = max(DeviceTime)) 
相关问题