dplyr按运行ID按组计算

时间:2018-01-04 02:34:28

标签: r dplyr aggregate

我的数据

is.over.sma ID.No   total.shares.months total.cost.months   total.shares.years  total.cost.years    RunID
1   79  0   0   0   0   7
1   80  0.79906924  833.3333333 0   0   7
1   81  0   0   0   0   7
1   82  0   0   0   0   7
1   83  0   0   0   0   7
1   84  0   0   0   0   7
1   85  0   0   0   0   7
1   86  0   0   0   0   7
1   87  0   0   0   0   7
1   88  0.56    700 0   0   7
1   89  0   0   0   0   7
1   90  0   0   0   0   7
1   91  0   0   0   0   7
1   92  0   0   0   0   7
1   93  0   0   0   0   7
1   94  0.78    900 0   0   8
1   95  0   0   0   0   8
1   96  0   0   0   0   8
1   97  0   0   0   0   8
1   98  0   0   0   0   8
1   99  0   0   0   0   8
1   100 0.751522595 833.3333333 0   0   8
1   101 0   0   0   0   8
1   102 0   0   0   0   8

目的是按RunID进行分组。然后在每个组内总计total.shares.months列和total.cost.months。

这就是我的尝试:

# Dplyr to group by over.sma
output.sma <- df %>%
  dplyr::mutate(RunID = ifelse(is.over.sma == 1,data.table::rleid(is.over.sma),0)) %>%
  group_by(RunID) %>%
  mutate(ID.No = ifelse(is.over.sma == 1,row_number(),0)) %>%
  dplyr::mutate(sum.shares.over.sma = ifelse(is.over.sma ==1,sum(total.shares.months),0)) %>%  # Divide total purchased by cost price for total share
  dplyr::mutate(sum.cost.over.sma = ifelse(is.over.sma ==1,sum(total.cost.months),0))
  ungroup() %>%
  select(-RunID)

RunID 7的所需输出总和应该=

sum.shares.over.sma = 1.359
sum.cost.over.sma = 1533.33

对于RunID 8:

sum.shares.over.sma = 1.531
sum.cost.over.sma = 1733.33

2 个答案:

答案 0 :(得分:3)

看到您的预期结果,我认为您可以执行以下操作。我调用了您的数据集mydf。您按RunID对数据进行分组。然后,您要将sum()应用于total.shares.monthstotal.cost.months,您可以在summarise_at()中执行此操作。

group_by(mydf, RunID) %>%
summarise_at(vars(total.shares.months:total.cost.months),
             funs(sum(., na.rm = TRUE))
            )

  RunID total.shares.months total.cost.months
  <int>               <dbl>             <dbl>
1     7                1.36              1533
2     8                1.53              1733

答案 1 :(得分:2)

已经接受了答案,这是一个完美的解决方案 - 只是为了显示aggregate版本:

x <- merge( aggregate(total.shares.months + total.shares.years ~ RunID, data = mydf, sum ), 
            aggregate(total.cost.months + total.cost.years ~ RunID, data = mydf, sum ))
colnames( x )[2:3] <- c( "sum.shares.over.sma", "sum.cost.over.sma" )
x
  RunID sum.shares.over.sma sum.cost.over.sma
1     7            1.359069          1533.333
2     8            1.531523          1733.333