R基于先前发生的增加变量

时间:2017-05-13 19:58:30

标签: r for-loop dataframe

我有按日期排序的餐厅检查数据框。对于每次观察,我想添加两个额外的变量来记录这家餐馆有多少次检查,以及他们失败了多少次。我想避免使用for循环,但我不确定如何做到这一点。基本上,我目前有一个由下面数据框的前三列组成的数据框,我想添加最后两列。

初始数据框

    Restaurant_ID    Date         Result
    1                01/02/2011   Pass 
    2                02/05/2011   Pass
    3                04/07/2011   Fail
    1                09/05/2011   Fail
    2                03/13/2012   Pass
    1                08/25/2012   Fail
    2                09/25/2012   Pass
    3                01/05/2013   Pass

期望输出1

Restaurant_ID    Date         Result   total_inspect  failed_inspect
1                01/02/2011   Pass     1              0
2                02/05/2011   Pass     1              0
3                04/07/2011   Fail     1              1
1                09/05/2011   Fail     2              1
2                03/13/2012   Pass     2              0
1                08/25/2012   Fail     3              2
2                09/25/2012   Pass     3              0
3                01/05/2013   Pass     2              1
编辑:我意识到我实际上希望最后两列反映当前观察之前的总检查次数和失败次数。所以我真正想要的是

期望输出2

    Restaurant_ID    Date         Result   past_inspect  past_failed_inspect
    1                01/02/2011   Pass     0              0
    2                02/05/2011   Pass     0              0
    3                04/07/2011   Fail     0              0
    1                09/05/2011   Fail     1              0
    2                03/13/2012   Pass     1              0
    1                08/25/2012   Fail     2              1
    2                09/25/2012   Pass     2              0
    3                01/05/2013   Pass     1              1

1 个答案:

答案 0 :(得分:3)

此解决方案使用包tidyverselubridate中的函数。

# Create the example data frame
dt1 <- read.csv(text = "Restaurant_ID,Date,Result
1,01/02/2011,Pass
2,02/05/2011,Pass
3,04/07/2011,Fail
1,09/05/2011,Fail
2,03/13/2012,Pass
1,08/25/2012,Fail
2,09/25/2012,Pass
               3,01/05/2013,Pass",
               stringsAsFactors = FALSE)

# Load packages
library(tidyverse)
library(lubridate)

dt2 <- dt1 %>%
  # Convert the Date column to Date class
  mutate(Date = mdy(Date)) %>%
  # Sort the data frame based on Restaurant_ID and Date
  arrange(Restaurant_ID, Date) %>%
  # group the data by each restaurant ID
  group_by(Restaurant_ID) %>%
  # Create a column showing total_inspect
  mutate(total_inspect = 1:n()) %>%
  # Create a column showing fail_result, fail is 1, pass is 0
  mutate(fail_result = ifelse(Result == "Fail", 1, 0)) %>%
  # Calculate the cumulative sum of fail_result
  mutate(failed_inspect = cumsum(fail_result)) %>%
  # Remove fail_result
  select(-fail_result) %>%
  # Sort the data frame by Date
  arrange(Date)

编辑:计算过去的检查和失败计数

dt3 <- dt2 %>%
  mutate(past_inspect = ifelse(total_inspect == 0, 0, total_inspect - 1)) %>%
  mutate(past_failed_inspect = ifelse(Result == "Fail" & failed_inspect != 0, 
                                      failed_inspect - 1,
                                      failed_inspect)) %>%
  select(-total_inspect, -failed_inspect)