R - 对齐不同频率的时间序列

时间:2014-10-03 14:29:38

标签: r csv time-series

我有两个.csv文件,其中包含底部给出的两个单独的时间序列。我可以将这些作为数据帧导入R:

data1 <- read.csv(data1.csv)
data2 <- read.csv(data2.csv)

我在每个数据框中都有datetimeprice个信息。 我想将data1data2的价格与单个表格中的常见频率10秒对齐。

我有两个时间序列的开始和结束日期和时间,但是频率(因此每个节目的观察次数,比如一天)是不同的,每天的开始和结束时间也不同

我厌倦了使用ts(),但我不认为此功能可以同时使用日期和时间。

将这些时间序列与共同频率对齐的最有效方法是什么?

data1.csv:

date,time,price
01/06/2014,05:59:42,1954.75
01/06/2014,06:00:05,1954.875
01/06/2014,06:00:06,1954.75
01/06/2014,06:00:08,1954.875
01/06/2014,06:02:05,1954.625
01/06/2014,06:02:22,1954.875
01/06/2014,06:03:12,1954.75
01/06/2014,06:03:14,1954.625
01/06/2014,06:03:20,1954.75
01/06/2014,06:03:22,1954.875
01/06/2014,06:03:23,1954.75
01/06/2014,06:03:26,1954.875
01/06/2014,06:07:07,1955.125
01/06/2014,06:07:21,1954.875
01/06/2014,06:08:54,1954.625
01/06/2014,06:16:55,1954.375
01/06/2014,06:17:00,1954.625
01/06/2014,06:21:46,1954.875
01/06/2014,06:28:11,1955.125
01/06/2014,06:30:23,1955.375
01/06/2014,06:30:49,1955.125
01/06/2014,06:33:33,1955.375
01/06/2014,06:34:30,1955.125
01/06/2014,06:37:39,1955.375
01/06/2014,06:37:43,1955.125
01/06/2014,06:47:42,1954.875
01/06/2014,06:50:23,1955.125
01/06/2014,06:57:10,1954.875
01/06/2014,06:57:12,1955.125
01/06/2014,07:00:08,1954.875
01/06/2014,07:00:21,1955.125
01/06/2014,07:00:55,1955.375
01/06/2014,07:01:19,1955.125
01/06/2014,07:01:51,1955.375
02/06/2014,05:59:50,1966.625
02/06/2014,06:00:00,1966.375
02/06/2014,06:00:07,1966.5
02/06/2014,06:00:08,1966.625
02/06/2014,06:00:10,1966.375
02/06/2014,06:00:33,1966.125
02/06/2014,06:00:34,1966.375
02/06/2014,06:00:41,1966.125
02/06/2014,06:00:48,1966.375
02/06/2014,06:02:48,1966.625
02/06/2014,06:03:24,1966.875
02/06/2014,06:04:23,1967.125
02/06/2014,06:04:39,1966.875
02/06/2014,06:05:28,1966.625
02/06/2014,06:06:25,1966.375
02/06/2014,06:07:44,1966.625

data2.csv:

date,time,price
01/06/2014,02:05:25,0
01/06/2014,06:00:07,3231.5
01/06/2014,06:00:17,3232.5
01/06/2014,06:00:19,3231.5
01/06/2014,06:00:33,3232.5
01/06/2014,06:00:40,3231.5
01/06/2014,06:00:41,3232.5
01/06/2014,06:00:42,3231.5
01/06/2014,06:00:44,3232.5
01/06/2014,06:04:06,3233.5
01/06/2014,06:04:22,3232.5
01/06/2014,06:04:42,3233.5
01/06/2014,06:08:48,3232.5
01/06/2014,06:10:12,3231.5
01/06/2014,06:10:35,3232.5
01/06/2014,06:21:45,3233.5
01/06/2014,06:21:55,3234.5
01/06/2014,06:29:00,3235.5
01/06/2014,06:33:34,3236.5
01/06/2014,06:34:30,3235.5
01/06/2014,06:41:33,3234.5
01/06/2014,06:47:42,3233.5
01/06/2014,06:48:33,3234.5
01/06/2014,06:50:23,3235.5
01/06/2014,06:52:04,3236.5
01/06/2014,06:57:11,3235.5
01/06/2014,07:00:00,3236.5
01/06/2014,07:00:06,3235.5
01/06/2014,07:00:08,3233.5
01/06/2014,07:00:09,3234.5
01/06/2014,07:00:10,3233.5
01/06/2014,07:00:11,3234.5
01/06/2014,07:00:21,3235.5
02/06/2014,06:00:10,3252.5
02/06/2014,06:00:20,3252
02/06/2014,06:00:21,3251.5
02/06/2014,06:00:33,3250.5
02/06/2014,06:00:34,3251
02/06/2014,06:00:35,3250.5
02/06/2014,06:00:41,3249.5
02/06/2014,06:01:31,3250.5
02/06/2014,06:01:32,3249.5
02/06/2014,06:01:38,3250.5
02/06/2014,06:02:47,3251.5
02/06/2014,06:05:32,3250.5
02/06/2014,06:06:25,3249.5
02/06/2014,06:07:44,3250.5
02/06/2014,06:08:11,3249.5
02/06/2014,06:12:32,3250.5
02/06/2014,06:16:56,3251.5
02/06/2014,06:17:08,3250.5
02/06/2014,06:18:32,3251.5
02/06/2014,06:31:59,3250.5
02/06/2014,06:32:11,3251.5
02/06/2014,06:44:47,3250.5
02/06/2014,06:45:09,3251.5
02/06/2014,06:52:33,3252.5
02/06/2014,06:52:36,3253.5
02/06/2014,06:55:30,3254.5
02/06/2014,06:55:39,3253.5
02/06/2014,06:57:27,3254.5
02/06/2014,07:00:01,3253.5
02/06/2014,07:00:02,3254.5
02/06/2014,07:00:17,3253.5
02/06/2014,07:00:23,3252.5

这是数据帧&#39; data1&#39;看起来像:

    date        time                Price
1   2014-06-01  06:03:59.614000     62.1250
2   2014-06-01  06:03:59.692000     62.2500
3   2014-06-01  06:15:42.004000     62.2375
4   2014-06-01  06:15:42.083000     61.9250
5   2014-06-01  06:17:01.654000     61.9125
6   2014-06-01  06:17:01.732000     61.9000
7   2014-06-01  06:23:41.908000     61.8200
8   2014-06-01  06:23:41.986000     61.8570
9   2014-06-01  06:23:55.211000     61.9065
10  2014-06-01  06:23:55.291000     61.8725
11  2014-06-01  06:24:11.679000     61.8715

2 个答案:

答案 0 :(得分:3)

示例数据集

date_time <- seq.POSIXt(as.POSIXlt("2014-01-06 06:00:00"), as.POSIXlt("2014-01-07 07:00:00"), by = "1 secs")
date_time_1 <- sample(date_time, 100)
date_time_2 <- sample(date_time, 100)

data1 <- data.frame(date=as.Date(date_time_2),
           time = format(date_time_1, "%H:%M:%S"),
           price = rnorm(100)
)
# format the date and time
data1$datetime <- strptime(paste(data1$date, data1$time), "%Y-%m-%d %H:%M:%S")

data2 <- data.frame(date=as.Date(date_time_2),
                    time = format(date_time_1, "%H:%M:%S"),
                    price = rnorm(100)
)
# format the date and time
data2$datetime <- strptime(paste(data2$date, data2$time), "%Y-%m-%d %H:%M:%S")

下一节回答您的问题

## Round off the times to 10 second increments
data1$datetime <- data1$datetime - as.numeric(format(data1$datetime, "%S"))%%10
data2$datetime <- data2$datetime - as.numeric(format(data2$datetime, "%S"))%%10

## Aggregate the data in case there are multiple observations in one 10 second block
data1_freq <- aggregate(data1$price, list(date=as.POSIXct(data1$datetime)), mean)
data2_freq <- aggregate(data2$price, list(date=as.POSIXct(data2$datetime)), mean)

### Now merge the two data sets - not dropping any observations
data <- merge(data2_freq, data1_freq, by="date", all = TRUE)

您可以选择将其合并为完整的时间序列

## create a continuous date based on the desired freq (here 10 seconds)
cont_date_time <- data.frame(date = 
                               seq.POSIXt(as.POSIXlt("2014-01-06 06:00:00"), 
                                          as.POSIXlt("2014-01-07 07:00:00"), 
                                          by = "10 secs")
)

# And merge the data with the complete time series
data_cont <- merge(data, cont_date_time, by = "date", all=TRUE)

将连续日期序列限制为工作日和工作时间

## create a continuous date based on the desired freq (here 10 seconds)
cont_date_time <- data.frame(date = 
                               seq.POSIXt(as.POSIXlt("2014-01-06 06:00:00"), 
                                          as.POSIXlt("2014-01-07 07:00:00"), 
                                          by = "10 secs")
)
# Use the lubridate package to subset the date sequence

library(lubridate)
## Use the wday function to see what day of the week it is (i.e. Monday - Friday)
cont_date_time <- cont_date_time[with(cont_date_time, wday(date)>=2&wday(date)<=6) ,]
## Use the hour function to see if it is within working hours
cont_date_time <- cont_date_time[with(cont_date_time, hour(date)>=9&hour(date)<=4) ,]

# And merge the data with the complete time series
data_cont <- merge(data, cont_date_time, by = "date", all=TRUE)

答案 1 :(得分:2)

如果您使用时间序列表示,这是最简单的。在这里,我们将数据读入动物园对象。 index = 1:2告诉它前两列包含索引,FUN=f指定一个转换函数,它将数据转换为"chron"类并截断为10分钟,agg=mean指定哪个函数用于聚合数据。然后我们可以合并动物园对象:

library(zoo)
library(chron)

f <- function(d, t) trunc(chron(as.character(d), as.character(t)), "00:10:00")

z1 <- read.zoo("data.csv", header=TRUE, sep=",", index=1:2, FUN=f, agg=mean)
z2 <- read.zoo("data2.csv", header=TRUE, sep=",", index=1:2, FUN=f, agg=mean)

merge(z1, z2)

给出:

                          z1       z2
(01/06/14 02:00:00)       NA    0.000
(01/06/14 05:50:00) 1954.750       NA
(01/06/14 06:00:00) 1954.804 3232.333
(01/06/14 06:10:00) 1954.500 3232.000
(01/06/14 06:20:00) 1955.000 3234.500
(01/06/14 06:30:00) 1955.250 3236.000
(01/06/14 06:40:00) 1954.875 3234.167
(01/06/14 06:50:00) 1955.042 3235.833
(01/06/14 07:00:00) 1955.175 3234.786
(02/06/14 05:50:00) 1966.625       NA
(02/06/14 06:00:00) 1966.533 3250.633
(02/06/14 06:10:00)       NA 3251.000
(02/06/14 06:30:00)       NA 3251.000
(02/06/14 06:40:00)       NA 3251.000
(02/06/14 06:50:00)       NA 3253.700
(02/06/14 07:00:00)       NA 3253.500