R中每个客户和产品的日期差异

时间:2015-02-20 14:33:42

标签: r aggregate

custid <- c(1,2,2,2) 

prod <- c("books", "highlighters", "books", "pens" )

qdate <- c(20130401,  20130403, 20130403, 20130404) 

tdate <- c(20130405,  20130804, 20130405, 20130405)

data <- data.frame(custid, prod, qdate, tdate)

  data$qdate <- as.Date(as.character(data$qdate), "%Y%m%d") 
  data$tdate <- as.Date(as.character(data$tdate), "%Y%m%d") 

(data2 <- difftime(data$tdate, data$qdate, data$custid, units="days")) #works

data2 <- aggregate(cbind(data$tdate=format(date, '%Y-%m-%d'))~cbind(data$qdate=format(date, '%Y-%m-%d'))  + data$prod + data$custid, data, difftime(data$tdate, data$qdate, data$custid, units="days"))

对于上面的R代码,我试图使用聚合函数来查找如下的输出。 difftime正确地给出了天差。但是,聚合函数不起作用并导致错误。有没有人知道如何解决这个问题?感谢。

custid  prod            qdate       tdate       days_difference
1       books           20130401    20130405    4
2       highlighters    20130403    20130804    123
2       books           20130403    20130405    2
2       pens            20130404    20130405    1

2 个答案:

答案 0 :(得分:2)

通过开始使用lubridate

,您可以更加简单
library(lubridate)
custid <- c(1,2,2,2) 

prod <- c("books", "highlighters", "books", "pens" )

# ymd = year, month, day
qdate <- ymd(c(20130401,  20130403, 20130403, 20130404))

tdate <- ymd(c(20130405,  20130804, 20130405, 20130405))

data <- data.frame(custid, prod, qdate, tdate)
data$days_difference <- with(data, difftime(tdate, qdate, units="days"))
data
  custid         prod      qdate      tdate days_difference
1      1        books 2013-04-01 2013-04-05          4 days
2      2 highlighters 2013-04-03 2013-08-04        123 days
3      2        books 2013-04-03 2013-04-05          2 days
4      2         pens 2013-04-04 2013-04-05          1 days

修改

如果你不想要&#39; days&#39;在列中使用as.numeric

data$days_difference <- as.numeric(with(data, difftime(tdate, qdate, custid, units="days")))
  custid         prod      qdate      tdate days_difference
1      1        books 2013-04-01 2013-04-05               4
2      2 highlighters 2013-04-03 2013-08-04             123
3      2        books 2013-04-03 2013-04-05               2
4      2         pens 2013-04-04 2013-04-05               1

答案 1 :(得分:2)

您不需要aggregate()进行逐行计算。您可以在&#34;日期&#34;上使用一元-运算符。分类对象。将其包裹在c()中以删除&#34; difftime&#34;类。

within(data, day_diff <- c(tdate - qdate))
#   custid         prod      qdate      tdate day_diff
# 1      1        books 2013-04-01 2013-04-05        4
# 2      2 highlighters 2013-04-03 2013-08-04      123
# 3      2        books 2013-04-03 2013-04-05        2
# 4      2         pens 2013-04-04 2013-04-05        1