按部门分组,然后按会计年度汇总

时间:2019-05-21 21:07:10

标签: r

我有一个数据集,其字段包含isic(国际标准行业分类),日期和现金。我想先按部门将其分组,然后再按会计年度获得总和。

#Here's a look at the data(cpt1). All the dates follow the following format "%Y-%m-01"
     Cash         Date           isic                           
1   373165     2014-06-01         K 
2   373165     2014-12-01         K 
3   373165     2017-09-01         K 
4   NA             <NA>           K 
5   4789       2015-05-01         K 
6   982121     2013-07-01         K 
                 .
                 .
                 .
#I was able to group to group them by sector and sum them
cpt_by_sector=cpt1 %>% mutate(sector=recode_factor(isic,                                                                          
     'A'='Agriculture','B'='Industry','C'='Industry','D'='Industry',
     'E'='Industry','F'='Industry',.default = 'Services',
     .missing = 'Services')) %>% 
  group_by(sector) %>% summarise_if(is.numeric, sum, na.rm=T)
#here's the result
   sector         `Cash`
    <fct>         <dbl>             
1 Agriculture   2094393819.
2 Industry     53699068183.
3 Services     223995196357.

#Below is what I would like to get. I would like to take into account the fiscal year i.e. from july to june.  
Sector      `2009/10` `2010/11` `2011/12` `2012/13` `2013/14` `2014/15` `2015/16` `2016/17`
<chr>        <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
1 Agriculture 2.02     3.62      3.65      6.26      7.04      8.36      11.7      11.6
2 Industry   87.8      117.      170.      163.      185.      211.       240.      252. 
3 Services   271.      343.      479.      495.      584.      664.       738.      821. 
4 Total      361.      464.      653.      664.      776.      883.       990.     1085. 

PS:我将日期列更改为日期格式

1 个答案:

答案 0 :(得分:1)

library(dplyr)
library(tidyr)
library(lubridate)

df %>%
  # FY is the year of the date, plus 1 if the month is July or later.
  # FY_label makes the requested format, by combining the prior year,
  #   a slash, and digits 3&4 of the FY.
  mutate(FY = year(Date) + if_else(month(Date) >= 7, 1, 0),
        FY_label = paste0(FY-1, "/", substr(FY, 3, 4))) %>%
  mutate(sector = recode_factor(isic,
                        'A'='Agriculture','B'='Industry','C'='Industry','D'='Industry',
                        'E'='Industry','F'='Industry', 'K'='Mystery Sector')) %>% 
  filter(!is.na(FY)) %>%  # Exclude rows with missing FY
  group_by(FY_label, sector) %>% 
  summarise(Cash = sum(Cash)) %>%
  spread(FY_label, Cash)


# A tibble: 1 x 4
  sector         `2013/14` `2014/15` `2017/18` 
  <fct>              <int>     <int>     <int> 
1 Mystery Sector   1355286    377954    373165