Question

我有一个如下所示的数据框（客户）：

    email    order_no  date       
a@stack.com  0012      2014-02-13  
a@stack.com  0013      2014-03-13  
a@stack.com  0014      2014-06-13  
b@stack.com  0015      2014-05-13   
b@stack.com  0016      2014-05-20  
b@stack.com  0017      2014-07-20

我想创建一个新字段，在每个客户的订单之间追加间隔。第一步是按日期升序排序：

customer <- arrange(customer, date)

下一步是迭代每个客户并计算订单间隔，以便结果集如下所示：

    email    order_no  date         days_interval
a@stack.com  0012      2014-02-13    0
a@stack.com  0013      2014-03-13    30
a@stack.com  0014      2014-06-13    90
b@stack.com  0015      2014-05-13    0 
b@stack.com  0016      2014-05-20    7
b@stack.com  0017      2014-07-20    60

这可以在不使用for循环的情况下实现吗？什么是最有效的方法。

使用FOR循环，这就是你要做的：

for (i in 2:nrow(customer)){
  if(customer$email[i]==customer$email[i-1]){
    customer$interval[i] <- as.integer(difftime(customer$date[i],customer$date[i-1]))
  }
}

如果不使用for循环，这是否可行？

Answer 1

diff应该适合你。它采用长度为n的向量，并返回长度为n-1的向量，其中包含向量项之间的差异。以下是一个例子。

> data <- data.frame(name=c("jeff","steve","jim"),date=today()+seq(-3:-5))
> data
   name       date
1  jeff 2015-04-28
2 steve 2015-04-29
3   jim 2015-04-30
> diff(data$date)
Time differences in days
[1] 1 1

您只需要将其与当前的工作结合起来。比如

customer$days_interval <- c(0, diff(customer$date))

Answer 2

以下是我使用dplyr和lubridate：

做的事情

library(dplyr)
library(lubridate)

df %>%
  group_by(email) %>%
  mutate(date = ymd(date)) %>%
  arrange(date) %>%
  mutate(days_interval = difftime(date, lag(date), unit="days"))

这是我得到的：

        email order_no       date days_interval
1 a@stack.com       12 2014-02-13       NA days
2 a@stack.com       13 2014-03-13       28 days
3 a@stack.com       14 2014-06-13       92 days
4 b@stack.com       15 2014-05-13       NA days
5 b@stack.com       16 2014-05-20        7 days
6 b@stack.com       17 2014-07-20       61 days

R循环计算日期间隔

2 个答案: