根据多列不断搞乱百分比变化

时间:2018-02-09 17:40:51

标签: r dplyr grouping percentage

我再次向您,伟大的stackoverflow社区寻求帮助。我不止一次地发布了这个问题,但不知何故,我无法解决这个简单明了的问题...老实说,我在这里感到有点沮丧,所以对所有帮助我的人都会感激不尽。

关于堆栈溢出的多个类似答案适用于我的小型可重现数据框架,但是当我在原始数据框架上使用相同的策略时,它无效。所以首先对变量进行一点翻译(它们是荷兰语):

  • Gemeente == municipality
  • jaar ==年
  • Beleidscode〜犯罪类别
  • aantal_misdrijven ==犯罪次数
  • 此问题我们不需要Kennisnamedatum ==(日期)和weekdag ==工作日。

我的问题:

我想计算20172015Gemeente分组的Beleidscode的变化。

library(tidyverse)

# This wil download my original data frame with ease:
df <- read_csv("https://github.com/thomasdebeus/colourful-facts/raw/master/projects/crime_dataset.csv")

# The following tries to first add a column with totals per 
# year, municipality and crime category. Then calculate percentage change.

df %>%
  group_by(Gemeente, jaar, Beleidscode) %>%
  arrange(Gemeente, jaar, Beleidscode) %>%
  summarise(per_jaar_Gem_misdrijf = sum(aantal_misdrijven)) %>%
  mutate(perct_change = (per_jaar_gem_misdrijf - lag(per_jaar_gem_misdrijf, order_by = jaar)) / lag(per_jaar_gem_misdrijf, order_by = jaar))
  ungroup()

所以是的,你可能会认为这不是创造正确的数字...... 我希望有人可以提供帮助。

1 个答案:

答案 0 :(得分:0)

看起来您正在尝试计算上一年的百分比变化。你可以这样做的一种方法是

library(tidyverse)

# This wil download my original data frame with ease:
df <- read_csv("https://github.com/thomasdebeus/colourful-facts/raw/master/projects/crime_dataset.csv")

# Create a data frame with the summary count by year
dfSumByYear <-
  df %>%
  group_by(Gemeente, jaar, Beleidscode) %>%
  summarise(per_jaar_Gem_misdrijf = sum(aantal_misdrijven)) %>% 
  ungroup()

# Add the Previous Year counts as an additional column
dfSumByYearWithPrev <-
  dfSumByYear %>% 
  left_join(dfSumByYear %>% 
              mutate(JaarJoin = jaar+1) %>% 
              rename(per_PrevJaar_Gem_misdrijf = per_jaar_Gem_misdrijf) %>% 
              select(-jaar), by = c("Gemeente", c("jaar"="JaarJoin"), "Beleidscode")) %>% 
  # Calculate the Percentage Change
  mutate(perct_change = (coalesce(per_jaar_Gem_misdrijf,0L) - coalesce(per_PrevJaar_Gem_misdrijf,0L)) / coalesce(per_PrevJaar_Gem_misdrijf,0L))

如果您想具体计算从2015年到2017年的变化,可以采用的一种方法是

library(tidyverse)

# This wil download my original data frame with ease:
df <- read_csv("https://github.com/thomasdebeus/colourful-facts/raw/master/projects/crime_dataset.csv")

# Calculate the percentage change from 2015 to 2017
df %>%
  group_by(Gemeente, jaar, Beleidscode) %>%
  summarise(per_jaar_Gem_misdrijf = sum(aantal_misdrijven)) %>% 
  ungroup() %>% 
  spread(jaar, per_jaar_Gem_misdrijf, fill = 0L) %>% 
  mutate(perct_change = (`2017` - `2015`) / `2015`)
相关问题