如何通过循环替换时序数据中的缺失值?

时间:2018-11-10 03:24:19

标签: r loops time-series missing-data

我正在尝试创建循环,以用<script> import ProfileComponents from './Profile.vue' import PasswordsComponents from './Passwords.vue' import ProjectsComponents from './Projects.vue' import FiniancialsComponents from './Finiancials.vue' import VerificationsComponents from './Verifications.vue' export default { data() { return { tabss:['Profile','Passwords','Projects','Finiancials','Verifications' ], tabConts:['<ProfileComponents/>','<PasswordsComponents/>','<ProjectsComponents/>','<FiniancialsComponents/>','<VerificationsComponents/>' ], }; }, components:{ ProfileComponents, PasswordsComponents, ProjectsComponents, FiniancialsComponents, VerificationsComponents } } </script> 替换丢失的时间序列数据。

这是我的数据

value == 0

我想要的输出是:

df
Times                   value
05-03-2018 09:00:00      1
05-03-2018 09:01:26      2
05-03-2018 09:04:28      1
05-03-2018 09:07:05      2
05-03-2018 09:09:05      1

应创建数据中缺少的分钟,并将其分配为0。

我该怎么办?用缺少的分钟创建新的虚拟表还是进行序列循环?

3 个答案:

答案 0 :(得分:2)

您可以使用dplyr和padr软件包进行此操作。 padr对于将日期时间序列扩展到日期之间或添加缺失值非常有用。

/Library/Developer/CommandLineTools/usr/include

数据:

$ echo $PATH | grep include
#does not return anything.

答案 1 :(得分:2)

您可以创建第二个“完整”数据框并将其合并在一起。

data: function () {
    return {
        items: [
            {
                str: 'This is ###.',
                list: ['Frank', 'Eva']
            },
            {
                str: 'I am not ###.',
                list: ['George', 'John', 'Mark']
            }
        ]
    }
},

然后用dif <- diff(as.numeric(range(df1$Times))) df1 <- merge(df1, data.frame(Times=as.POSIXct(0:(dif/60)*60, origin=df1[1, 1], tz="UTC")), all=TRUE) 替换生成的NA

0

最后删除重复项。

df1[is.na(df1$value), 2] <- 0

产量:

df1 <- df1[-which(duplicated(strftime(df1$Times, format="%M"))) + 1, ]

数据:

> df1
                 Times value
1  2018-03-05 09:00:00     1
3  2018-03-05 09:01:26     2
4  2018-03-05 09:02:00     0
5  2018-03-05 09:03:00     0
7  2018-03-05 09:04:28     1
8  2018-03-05 09:05:00     0
9  2018-03-05 09:06:00     0
11 2018-03-05 09:07:05     2
12 2018-03-05 09:08:00     0
14 2018-03-05 09:09:05     1

答案 2 :(得分:1)

library(tidyverse)
library(lubridate)
library(magrittr)

重新创建数据

df <- tibble(
  Times = c("05-03-2018 09:00:00", "05-03-2018 09:01:26",
            "05-03-2018 09:04:28", "05-03-2018 09:07:05",
            "05-03-2018 09:09:05"),
  value = c(1, 2, 1, 2, 1)
)

代码

将您的Times变量解析为datetime

df$Times %<>% parse_datetime("%d-%m-%Y %H:%M:%S")

创建一个新的变量join,该变量将被截短至分钟数

df %<>% mutate(join = floor_date(Times, unit = "minute"))

使用一个也称为join的变量创建一个新数据框,并包含您范围内的每一分钟

all <- tibble(
  join = seq(as_datetime(first(df$Times), as_datetime(last(df$Times)), by = 60)
)

同时加入两个数据帧

result <- left_join(all, df)

将“丢失的分钟数”添加到您的Times变量中

result$Times[is.na(result$Times)] <- result$join[is.na(result$Times)]

NA替换为0

result$value[is.na(result$value)] <- 0

删除join变量

result %>%
  select(- join)

结果

# A tibble: 10 x 2
   Times               value
   <dttm>              <dbl>
 1 2018-03-05 09:00:00     1
 2 2018-03-05 09:01:26     2
 3 2018-03-05 09:02:00     0
 4 2018-03-05 09:03:00     0
 5 2018-03-05 09:04:28     1
 6 2018-03-05 09:05:00     0
 7 2018-03-05 09:06:00     0
 8 2018-03-05 09:07:05     2
 9 2018-03-05 09:08:00     0
10 2018-03-05 09:09:05     1