将日期中的NA替换为另一个日期

时间:2015-07-18 11:25:27

标签: r date lubridate

数据:

DB1 <- data.frame(orderItemID  = 1:10,     
orderDate = c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),  
deliveryDate = c("2013-01-23", "2013-03-01", "NA", "2013-06-04", "2014-01-03", "NA", "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"))

预期结果:

   DB1 <- data.frame(orderItemID  = 1:10,     
 orderDate= c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),  
deliveryDate = c("2013-01-23", "2013-03-01", "2013-04-14", "2013-06-04", "2014-01-03", "2014-02-21", "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"))

我的问题与我发布的另一个问题类似:所以不要混淆。 如您所见,我在交货日期有一些缺失值,我想用另一个日期替换它们。该日期应为特定项目的订单日期+(完整)天的平均交货时间。(2天) 平均交货时间是根据不包含缺失值的所有样品的平均值计算的时间=(2天+ 1天+ 3天+ 2天+ 1天+ 2天+ 1天+ 2天):8 = 1,75

所以我想用订单日期+ 2天替换交货时间的NA。如果没有NA,则日期应保持不变。

我已经尝试过了(使用lubridate),但它不能正常工作:(

DB1$deliveryDate[is.na(DB1$deliveryDate) ] <- DB1$orderDate + days(2)

有人可以帮助我吗?

3 个答案:

答案 0 :(得分:4)

首先,将列转换为Date个对象:

DB1[,2:3]<-lapply(DB1[,2:3],as.Date)

然后,替换NA元素:

DB1$deliveryDate[is.na(DB1$deliveryDate)] <- 
       DB1$orderDate[is.na(DB1$deliveryDate)] +
       mean(difftime(DB1$orderDate,DB1$deliveryDate,units="days"),na.rm=TRUE)
#   orderItemID  orderDate deliveryDate
#1            1 2013-01-21   2013-01-23
#2            2 2013-03-31   2013-03-01
#3            3 2013-04-12   2013-04-14
#4            4 2013-06-01   2013-06-04
#5            5 2014-01-01   2014-01-03
#6            6 2014-02-19   2014-02-21
#7            7 2014-02-27   2014-02-28
#8            8 2014-10-02   2014-10-04
#9            9 2014-10-31   2014-11-01
#10          10 2014-11-21   2014-11-23 

答案 1 :(得分:3)

你可以这样做:

DB1 =cbind(DB1$orderItemID,as.data.frame(lapply(DB1[-1], as.character)))

days = round(mean(DB1$deliveryDate-DB1$orderDate, na.rm=T))
mask = is.na(DB1$deliveryDate)

DB1$deliveryDate[mask] = DB1$orderDate[mask]+days

#   DB1$orderItemID  orderDate deliveryDate
#1                1 2013-01-21   2013-01-23
#2                2 2013-03-31   2013-04-01
#3                3 2013-04-12   2013-04-14
#4                4 2013-06-01   2013-06-04
#5                5 2014-01-01   2014-01-03
#6                6 2014-02-19   2014-02-21
#7                7 2014-02-27   2014-02-28
#8                8 2014-10-02   2014-10-04
#9                9 2014-10-31   2014-11-01
#10              10 2014-11-21   2014-11-23

我重新整理你的数据,因为它们不干净:

DB1 <- data.frame(orderItemID  = 1:10,     
orderDate = c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),  
deliveryDate = c("2013-01-23", "2013-04-01", NA, "2013-06-04", "2014-01-03", NA, "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"))

答案 2 :(得分:1)

假设您已经输入了这样的数据(注意,NA没有用引号括起来,所以它们被读作NA而不是“NA”)...

DB1 <- data.frame(orderItemID  = 1:10,     
  orderDate = c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),  
  deliveryDate = c("2013-01-23", "2013-03-01", NA, "2013-06-04", "2014-01-03", NA, "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"),
  stringsAsFactors = FALSE)

...而且,根据Nicola的回答,这样做是为了使格式正确......

DB1[,2:3]<-lapply(DB1[,2:3],as.Date)

......这也有效:

library(lubridate)
DB1$deliveryDate <- with(DB1, as.Date(ifelse(is.na(deliveryDate), orderDate + days(2), deliveryDate), origin = "1970-01-01"))

或者你可以使用dplyr并管道它:

library(lubridate)
library(dplyr)
DB2 <- DB1 %>%
  mutate(deliveryDate = ifelse(is.na(deliveryDate), orderDate + days(2), deliveryDate)) %>%
  mutate(deliveryDate = as.Date(.[,"deliveryDate"], origin = "1970-01-01"))