Question

我正在尝试做一些我确定很容易的事情，但是我什至不知道该如何正确地寻找灵魂。

我有很长一段时间的数据。一栏表示状态，其值为0、1、2或-1。我想提取状态为2的数据并进行处理。提取所有这些数据没问题；我想做的是分别处理state = 2的每个周期。也就是说，我要遍历数据，在state = 2的情况下，进行一些分析（例如，对与此相关的其他变量进行线性拟合以拟合初始周期）时间段），然后转到状态= 2的下一个实例并重复此分析。或者，将每个周期的数据提取到其自己的数据帧中并进行分析（但这会产生小数据帧的杂音）。

我什至不知道如何识别起点（即，i = 2和i-1 = 1）。

任何指向我应该查看的命令或软件包的指针将不胜感激。

Answer 1

您可以将apply函数与MARGIN = 1一起使用（按行），然后将可以按索引访问每个结果，因为apply函数的值是一个列表。请看如下：

# Simulation of the data ----
library(lubridate)

set.seed(123)
n <- 20
df <- data.frame(
  id = factor(sample(-1:2, 20, replace = TRUE)),
  t_stamp = sort(sample(seq(dmy_hm("01-01-2018 00:00"), dmy_hm("01-01-2019 00:00"), by = 10), 20)),
  matrix(rnorm(100 * n), nrow = n, dimnames = list(NULL, c(paste0("X", 1:50), paste0("Y", 1:50)))))

# Calculation & output -----
# filter out all id == 2
df_m <- df[df$id == 2, ]

# apply to all the data linear regression ----
df_res <- apply(df_m, 1, function(xs) {
  rw <- as.numeric(xs[-(1:2)])
  x <- rw[1:50]
  y <- rw[51:100]
  smry <- summary(lm(y ~ x))
  list(xs[2], smry)

})

# access to the second incident
df_res[[2]]
# [[1]]
# t_stamp 
# "2018-03-26 13:01:40" 
# 
# [[2]]
# 
# Call:
#  lm(formula = y ~ x)
#
# Residuals:
#   Min       1Q   Median       3Q      Max 
# -2.44448 -0.79877  0.09427  0.88524  3.11190 
# 
# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)
# (Intercept)   0.1731     0.1592   1.088    0.282
# x             0.1595     0.1544   1.033    0.307
# 
# Residual standard error: 1.125 on 48 degrees of freedom
# Multiple R-squared:  0.02173, Adjusted R-squared:  0.001349 
# F-statistic: 1.066 on 1 and 48 DF,  p-value: 0.307

# check the second incident
df_m[2, ]
# id             t_stamp         X1       X2        X3        X4   ...        X50        Y1         Y2       Y3        Y4
# 4  2 2018-03-26 13:01:40 -0.7288912 2.168956 -1.018575 0.6443765 ... -0.1321751 0.8983962 -0.2608322 1.036548 -1.691862

根据R

1 个答案: