将不规则日期时间(缺少日期时间)与常规日期时间表相匹配

时间:2017-03-18 22:50:39

标签: r algorithm datetime match

假设datetimes1是不定期的日期时间,datetimes2是定期的日期时间。 datetimes1有一些缺少的日期时间,例如5:10,如第一个表所示。

我想要的是尝试将datetimes1datetimes2匹配,以便每个datetimes1都接近datetimes2并且所有datetimes1似乎都是正确的行。

首先,我尝试将datetimes1四舍五入到最接近的5分钟并尝试将它们与datetimes2匹配,但有些日期时间会关闭3分钟,因此它们会四舍五入到不正确的值。

我接下来尝试的是首先将datetimes1设置为datetimes2并将datetimes1datetimes2完全匹配,然后将tolerance设置为0 ,然后在每个循环中开始将tolerance增加一个,并将尚未匹配的datetimes1与指定datetimes2关闭的tolerance匹配。

这种方法的问题是5:33和5:37两次从5:35开始都是2分钟,所以5:33首先匹配到5:35然后5:37不包含在表。有关使用给定代码获得的结果,请参见第二个表。

你知道如何解决这个问题吗?

我想看到的内容:

           datetimes1          datetimes2
1 2014-07-24 05:05:00 2014-07-24 05:05:00
2                <NA> 2014-07-24 05:10:00
3 2014-07-24 05:15:00 2014-07-24 05:15:00
4 2014-07-24 05:23:00 2014-07-24 05:20:00
5 2014-07-24 05:27:00 2014-07-24 05:25:00
6 2014-07-24 05:33:00 2014-07-24 05:30:00
7 2014-07-24 05:37:00 2014-07-24 05:35:00
8 2014-07-24 05:41:00 2014-07-24 05:40:00
9 2014-07-24 05:45:00 2014-07-24 05:45:00

但我得到了这个:

           datetimes1          datetimes2
1 2014-07-24 05:05:00 2014-07-24 05:05:00
2                <NA> 2014-07-24 05:10:00
3 2014-07-24 05:15:00 2014-07-24 05:15:00
4                <NA> 2014-07-24 05:20:00
5 2014-07-24 05:23:00 2014-07-24 05:25:00
6 2014-07-24 05:27:00 2014-07-24 05:30:00
7 2014-07-24 05:33:00 2014-07-24 05:35:00
8 2014-07-24 05:41:00 2014-07-24 05:40:00
9 2014-07-24 05:45:00 2014-07-24 05:45:00

这是我的代码:

irregulars <- c("2014-07-24 05:05",
                "2014-07-24 05:15",
                "2014-07-24 05:23",
                "2014-07-24 05:27",
                "2014-07-24 05:33",
                "2014-07-24 05:37",
                "2014-07-24 05:41",
                "2014-07-24 05:45")

df1 <- data.frame(datetimes <- as.POSIXct(irregulars, "GMT"))

regulars <- c("2014-07-24 05:05",
              "2014-07-24 05:10",
              "2014-07-24 05:15", 
              "2014-07-24 05:20",
              "2014-07-24 05:25",
              "2014-07-24 05:30",
              "2014-07-24 05:35",
              "2014-07-24 05:40",
              "2014-07-24 05:45")

df2 <- setNames(data.frame(matrix(NA,length(regulars),2)),c("datetimes1","datetimes2"))
df2$datetimes2 <- as.POSIXct(regulars, "GMT")

# Match irregulars to regulars
for(tolerance in c(0:3)) {
  for(idx in which(!df1$datetimes %in% df2$datetimes1)) {
    dt <- abs(difftime(df2$datetimes2, df1$datetimes[idx], "GMT", "mins"))
    dt.min <- min(dt[is.na(df2$datetimes1)])
    if (dt.min > tolerance) next
    idx2 <- which(dt == dt.min)
    df2$datetimes1[idx2] <- df1$datetimes[idx]
  }
}

df2$datetimes1 <- as.POSIXct(df2$datetimes1, "GMT", origin = "1970-01-01 00:00:00")

2 个答案:

答案 0 :(得分:1)

这是一种方法。假设我们的匹配算法是从一组与目标var dynamicHTML = makeHTML(data1, data2) return { title: information content: dynamicHTML } 具有最小绝对差异的候选者中找到时间,条件是差异必须低于容差水平(比如5分钟或300秒) ):

x

我们案例中的候选人是“非正规人士”,我们的目标是“常客”。这里的主要思想是迭代“常客”,每当我们从候选人集​​中找到匹配时,我们就会从候选人中删除它:

closest <- function(x, candidates, tol = 300) {
  timediff <- abs(difftime(x, candidates, units = "secs"))
  if (all(timediff >= tol)) return(NA)
  candidates[which.min(timediff)]
}

这是完整的MWE。首先设置时间向量:

candidates <- irregulars
out <- sapply(regulars, function(x) {
    matched <- closest(x, candidates, tol = 300)
    candidates <<- setdiff(candidates, matched)
    matched
})

定义irregulars <- c("2014-07-24 05:05", "2014-07-24 05:15", "2014-07-24 05:23", "2014-07-24 05:27", "2014-07-24 05:33", "2014-07-24 05:37", "2014-07-24 05:41", "2014-07-24 05:45") regulars <- c("2014-07-24 05:05", "2014-07-24 05:10", "2014-07-24 05:15", "2014-07-24 05:20", "2014-07-24 05:25", "2014-07-24 05:30", "2014-07-24 05:35", "2014-07-24 05:40", "2014-07-24 05:45") 函数并迭代:

closest

显示输出:

closest <- function(x, candidates, tol = 600) {
  timediff <- abs(difftime(x, candidates, units = "secs"))
  if (all(timediff >= tol)) return(NA)
  candidates[which.min(timediff)]
}

candidates <- irregulars
out <- sapply(regulars, function(x) {
    matched <- closest(x, candidates, tol = 300)
    candidates <<- setdiff(candidates, matched)
    matched
})

答案 1 :(得分:0)

假设不规则时间与常规时间相差5分钟,您可以迭代两个向量,并仅在差异小于5分钟时选择:

i=1
j=1
while(i<=nrow(df2) & j<=nrow(df1))
{
    d <-difftime(df2$datetimes2[i], df1$datetimes[j], "GMT",unit="mins")

    if (abs(d) < 5) {
            df2$datetimes1[i] <-  df1$datetimes[j]
            j=j+1
            i=i+1
        } else if(d>0) j=j+1
    else i=i+1


}
df2$datetimes1 <- as.POSIXct(df2$datetimes1, "GMT", origin = "1970-01-01 00:00:00")


> df2
           datetimes1          datetimes2
1 2014-07-24 05:05:00 2014-07-24 05:05:00
2                <NA> 2014-07-24 05:10:00
3 2014-07-24 05:15:00 2014-07-24 05:15:00
4 2014-07-24 05:23:00 2014-07-24 05:20:00
5 2014-07-24 05:27:00 2014-07-24 05:25:00
6 2014-07-24 05:33:00 2014-07-24 05:30:00
7 2014-07-24 05:37:00 2014-07-24 05:35:00
8 2014-07-24 05:41:00 2014-07-24 05:40:00
9 2014-07-24 05:45:00 2014-07-24 05:45:00
相关问题