R-计算不同时间间隔之间的跑步总数

时间:2018-08-14 19:33:20

标签: r

我有一个跟踪某些贷款余额的数据框。每次向余额付款(“金额”),该资产贷款的新余额就会显示在“余额”列中。

df = data.frame(Date = c("2015-03-01", "2015-05-01", "2016-07-02", "2017-11-24", "2017-12-15"),
            Property = c("1 Main St", "1 Main St", "1 Main St", "5 Main St", "1 Main St"),
            Amount = c(50000, -10000, -5000, 75000, -4000),
            Balance = c(50000, 40000, 35000, 75000, 31000)
            )

如您所见,日期相当分散,大多数月份都没有任何交易记录。我希望能够制作一个在每个月初具有每个属性余额的数据框,而不管该月是否有交易。像这样:

Month = c("March 2015", "April 2015", "May 2015", "June 2015"),
Property = c("1 Main St", "1 Main St", "1 Main St", "1 Main St").
Balance = c(50000, 50000, 40000, 40000)

它还需要能够处理当月的最后一笔交易(如果在给定的月份内某物业的交易不止一次)。有什么想法如何处理吗?

1 个答案:

答案 0 :(得分:0)

首先,请确保您的Date字段的类型为“日期”。这是我用来处理数据的电话:

df = data.frame(Date = as.Date(c("2015-03-01", "2015-05-01", "2016-07-02", "2017-11-24", "2017-12-15"), "%Y-%m-%d"),
            Property = c("1 Main St", "1 Main St", "1 Main St", "5 Main St", "1 Main St"),
            Amount = c(50000, -10000, -5000, 75000, -4000),
            Balance = c(50000, 40000, 35000, 75000, 31000),
            stringsAsFactors = FALSE
            )

注意,我还向stringsAsFactors = FALSE调用中添加了data.frame参数。

然后,我使用以下代码来也许(?)回答您的问题:

library(tidyr)
library(dplyr)
library(lubridate)

arrange(df, Date)

from <- first(df$Date)
to <- last(df$Date)

new_df <- df %>%
        complete(Date = seq.Date(from, to, "day"))%>%
        fill(Property:Balance)%>%
        group_by(year = year(Date), month=month(Date, label = TRUE), Property)%>%
        summarise(Balance = last(Balance))