两行一组汇总

时间:2019-03-10 13:12:45

标签: r dplyr tidyverse

我有一个数据框,希望将其按两个变量分组,然后汇总总数和平均值。

我对数据进行了尝试,这是正确的。

df %>%
  group_by(date, group) %>%
  summarise(
    weight = sum(ind_weigh) ,
    total_usage = sum(total_usage_min) ,
    Avg_usage = total_usage / weight) %>% 
  ungroup()

它将返回此数据帧:

df <- tibble::tribble(
     ~date, ~group,   ~weight, ~total_usage, ~Avg_usage,
  20190201,      0,  450762,     67184943,        149,
  20190201,      1, 2788303,    385115718,        138,
  20190202,      0,  483959,     60677765,        125,
  20190202,      1, 2413699,    311226351,        129,
  20190203,      0,  471189,     59921762,        127,
  20190203,      1, 2143811,    277425186,        129,
  20190204,      0,  531020,     83695977,        158,
  20190204,      1, 2640087,    403200829,        153
  )

我想知道如何在脚本中添加另一个变量以获取avg_usage_total(适用于第0组和第1组)。

预期结果:

ex,第一行->(67184943 /(450762 + 2788303)= 20.7

date    group   rech    total_usage Avg_usage   Avg_usage_total
20190201    0   450762  67184943    149             20.7
20190201    1   2788303 385115718   138             118.9

1 个答案:

答案 0 :(得分:3)

如有必要,您可以使用mutategroup_by来做到这一点。

library(tidyverse)

# generate dataset
(df <- tibble(
  date = c(rep(Sys.Date(), 10), rep(Sys.Date() - 1, 10)),
  group = rbinom(20, 1, 0.5),
  rech = runif(20),
  weight = runif(20),
  total_usage = runif(20)
))
# A tibble: 20 x 5
   date       group   rech weight total_usage
   <date>     <int>  <dbl>  <dbl>       <dbl>
 1 2019-03-10     0 0.985  0.831      0.963  
 2 2019-03-10     1 0.178  0.990      0.676  
 3 2019-03-10     1 0.505  0.697      0.152  
 4 2019-03-10     1 0.416  0.165      0.824  
 5 2019-03-10     0 0.554  0.790      0.974  

# step 1 of analysis
(df <- df %>%
  group_by(date, group) %>%
  summarise(rech = sum(rech),
            weight = sum(weight),
            total_usage = sum(total_usage)) %>%
  mutate(Avg_usage = total_usage / weight))
# A tibble: 4 x 6
# Groups:   date [2]
  date       group  rech weight total_usage Avg_usage
  <date>     <int> <dbl>  <dbl>       <dbl>     <dbl>
1 2019-03-09     0  3.29   4.82        3.03     0.628
2 2019-03-09     1  1.45   1.22        1.16     0.954
3 2019-03-10     0  1.54   1.62        1.94     1.20 
4 2019-03-10     1  3.15   4.55        4.63     1.02 

# step 2 of analysis
df %>%
  group_by(date) %>% # only necessary if you want to compute Avg_usage_total by date
  mutate(Avg_usage_total = total_usage / sum(rech)) %>% # total_usage is taken by row, sum is taken for the entire column
  ungroup()
# A tibble: 4 x 7
  date       group  rech weight total_usage Avg_usage Avg_usage_total
  <date>     <int> <dbl>  <dbl>       <dbl>     <dbl>           <dbl>
1 2019-03-09     0  3.29   4.82        3.03     0.628           0.639
2 2019-03-09     1  1.45   1.22        1.16     0.954           0.246
3 2019-03-10     0  1.54   1.62        1.94     1.20            0.413
4 2019-03-10     1  3.15   4.55        4.63     1.02            0.986
相关问题