标记数据帧中的一系列行元素 (R)

时间:2021-01-25 15:04:21

标签: r data.table

我正在处理眼动追踪数据,我正在尝试创建一个新列“SaccadePerTrial”,该列将计算(并标记)每个独特试验中眼跳 (S) 的发生(同时忽略注视 (F))。

这是我的数据框目前的样子:

Trial | FixationSaccade
1     | F
1     | F
1     | S
1     | S
1     | F
1     | F
1     | S
1     | S

2     | F
2     | F
2     | S
2     | S
2     | F
2     | F
2     | S
2     | S

“SaccadePerTrial”列应如下所示:

Trial | FixationSaccade | SaccadePerTrial
1     | F               | NA
1     | F               | NA
1     | S               | 1
1     | S               | 1
1     | F               | NA
1     | F               | NA
1     | S               | 2
1     | S               | 2

2     | F               | NA
2     | F               | NA
2     | S               | 1
2     | S               | 1
2     | F               | NA
2     | F               | NA
2     | S               | 2
2     | S               | 2

这类似于函数 rleid(),但我希望该函数忽略不是扫视 (S) 的值。另一种选择(虽然不太受欢迎)是分别对 'FixationSaccade' 列中的每个值进行 rleid()(使 Fs 和 Ss 从 1 开始)。

有谁知道我如何实现这一目标?谢谢!

2 个答案:

答案 0 :(得分:0)

dat[, S := rleid(FixationSaccade == "S"), by=.(Trial) ][
  FixationSaccade == "F", S := NA ][
  , S := (S + (min(S, na.rm = TRUE) == 1L)) / 2L ]
dat
#     Trial FixationSaccade SaccadePerTrial     S
#     <num>          <char>          <char> <num>
#  1:     1               F            <NA>    NA
#  2:     1               F            <NA>    NA
#  3:     1               S               1     1
#  4:     1               S               1     1
#  5:     1               F            <NA>    NA
#  6:     1               F            <NA>    NA
#  7:     1               S               2     2
#  8:     1               S               2     2
#  9:     2               F            <NA>    NA
# 10:     2               F            <NA>    NA
# 11:     2               S               1     1
# 12:     2               S               1     1
# 13:     2               F            <NA>    NA
# 14:     2               F            <NA>    NA
# 15:     2               S               2     2
# 16:     2               S               2     2

我发现 magrittr::%>% 更具可读性,所以对于样式来说也是如此:

library(magrittr)
dat[, S := rleid(FixationSaccade == "S"), by=.(Trial)] %>%
  .[FixationSaccade == "F", S := NA ] %>%
  .[, S := (S + (min(S, na.rm = TRUE) == 1L)) / 2L ]

答案 1 :(得分:0)

我会这样做:

dat[, newCol := rleid(FixationSaccade), by = .(Trial)]
dat[FixationSaccade == 'F', newCol := NA]
dat[FixationSaccade == 'S', newCol := rleid(newCol), by = .(Trial)]
# > dat
#     Trial FixationSaccade newCol
#  1:     1               F     NA
#  2:     1               F     NA
#  3:     1               S      1
#  4:     1               S      1
#  5:     1               F     NA
#  6:     1               F     NA
#  7:     1               S      2
#  8:     1               S      2
#  9:     2               F     NA
# 10:     2               F     NA
# 11:     2               S      1
# 12:     2               S      1
# 13:     2               F     NA
# 14:     2               F     NA
# 15:     2               S      2
# 16:     2               S      2

或者使用自定义版本的 rleid

rleid2 <- function(x){
    r <- rle(x)
    y <- cumsum(r$values == 'S')
    y[r$values == 'F'] <- NA
    r$values <- y
    inverse.rle(r)
}
dat[, newCol2 := rleid2(FixationSaccade), by = .(Trial)]

#     Trial FixationSaccade newCol newCol2
#  1:     1               F     NA      NA
#  2:     1               F     NA      NA
#  3:     1               S      1       1
#  4:     1               S      1       1
#  5:     1               F     NA      NA
#  6:     1               F     NA      NA
#  7:     1               S      2       2
#  8:     1               S      2       2
#  9:     2               F     NA      NA
# 10:     2               F     NA      NA
# 11:     2               S      1       1
# 12:     2               S      1       1
# 13:     2               F     NA      NA
# 14:     2               F     NA      NA
# 15:     2               S      2       2
# 16:     2               S      2       2
相关问题