Question

我有一个深度数据集（2个月）。深度之间的最小时间间隔小于1分钟，最大值为几天。在R中，我想基于每个观测周围的6小时（或12小时）时间窗来计算深度的移动平均值（不是基于滞后/超前观测数量的窗口）。

我已经尝试过动物园套餐，但我似乎无法让rollmean为我工作。

我的一小部分数据是： https://www.dropbox.com/s/lhhrdgt2mxasc9v/fid57.depth.test1.csv

在R中它看起来像：

> str(my.data)
'data.frame':   51 obs. of  2 variables:
 $ DateTime: POSIXct, format: "2013-08-07 06:49:46" "2013-08-07 06:55:17" "2013-08-07" 07:06:52" "2013-08-07 07:23:43" ...
 $ Depth   : num  28.6 31.7 29 35.2 33 ...

 >head(my.data)
DateTime "Depth"
2013-08-07 06:49:46 28.58
2013-08-07 06:55:17 31.7
2013-08-07 07:06:52 29.02
2013-08-07 07:23:43 35.18
2013-08-07 07:27:14 32.98
2013-08-07 08:20:21 55.84

> dput(head(my.data))
structure(list(DateTime = structure(c(1375883386, 1375883717, 
1375884412, 1375885423, 1375885634, 1375888821), class = c("POSIXct", 
"POSIXt"), tzone = ""), Depth = c(28.58, 31.7, 29.02, 35.18, 
32.98, 55.84)), .Names = c("DateTime", "Depth"), row.names = c(8481L, 
8483L, 8484L, 8485L, 8487L, 8495L), class = "data.frame")

任何建议都将不胜感激。
提前谢谢！

Answer 1

这不是你要求的，只是为了简单地将它切成6小时非重叠间隔，并且在这些间隔内进行平均就足够了，这里有一些代码：

library(zoo)

z <- read.zoo("fid57.depth.test1.csv", header = TRUE, 
              index = 1:2, format = "%Y-%m-%d %H:%M:%S", tz = "")
z6 <- aggregate(xx, as.POSIXct(cut(time(z), "6 hours")), mean)

给出这个：

> z6
2013-08-07 06:00:00 2013-08-07 12:00:00 2013-08-07 18:00:00 2013-08-08 00:00:00 
           43.40810            39.13500            22.31250            17.38333 
2013-08-08 06:00:00 2013-08-08 12:00:00 2013-08-08 18:00:00 2013-08-09 00:00:00 
                 NA            15.53333                  NA                  NA 
2013-08-09 06:00:00 2013-08-09 12:00:00 
                 NA            23.30455

如果不需要NA条目，请使用na.omit(z6)。

另请注意，输入文件的扩展名为.csv，但不是csv文件。

上面示例中使用的数据是：

"DateTime ""Depth"""
2013-08-07 06:49:46 28.58
2013-08-07 06:55:17 31.7
2013-08-07 07:06:52 29.02
2013-08-07 07:23:43 35.18
2013-08-07 07:27:14 32.98
2013-08-07 08:20:21 55.84
2013-08-07 09:05:35 47.05
2013-08-07 09:10:28 65.96
2013-08-07 09:37:21 40.01
2013-08-07 09:44:59 47.05
2013-08-07 09:58:30 43.53
2013-08-07 10:02:45 47.49
2013-08-07 10:07:23 47.93
2013-08-07 10:11:31 56.28
2013-08-07 10:15:38 61.12
2013-08-07 10:19:39 53.2
2013-08-07 10:27:28 43.53
2013-08-07 10:31:44 40.89
2013-08-07 10:45:19 31.2
2013-08-07 10:47:29 31.7
2013-08-07 10:49:44 41.33
2013-08-07 12:01:00 33.86
2013-08-07 12:05:06 35.62
2013-08-07 17:25:35 43.53
2013-08-07 17:40:25 43.53
2013-08-07 18:15:03 42.65
2013-08-07 21:29:33 16.3
2013-08-07 22:05:15 14.9
2013-08-07 22:07:44 15.4
2013-08-08 02:18:36 16.3
2013-08-08 02:23:26 16.3
2013-08-08 03:34:21 16.3
2013-08-08 03:55:46 16.7
2013-08-08 05:05:53 17.6
2013-08-08 05:10:27 21.1
2013-08-08 15:36:02 16.7
2013-08-08 16:12:20 12.8
2013-08-08 16:16:55 17.1
2013-08-09 13:17:04 22.4
2013-08-09 13:22:32 21.1
2013-08-09 13:25:58 24.2
2013-08-09 13:37:01 15.4
2013-08-09 13:40:16 14.1
2013-08-09 13:46:46 14.1
2013-08-09 13:54:31 27.26
2013-08-09 14:18:53 40.89
2013-08-09 14:22:34 21.5
2013-08-09 14:26:52 28.14
2013-08-09 14:36:35 27.26

Answer 2

我建议使用runner软件包和内置函数mean_run。 vignette中的取决于日期的窗口部分描述了您的问题。下面是平均7个天的示例。

# random value and irregular data generation
x <- runif(15)
date <- as.Date(cumsum(rpois(n = 15, lambda = 2)), origin = Sys.Date())

library(runner)
mean_run(x, k = 7, idx = date)

运行不完整时间序列数据的平均值

2 个答案: