Question

example.df <- data.frame(GY = sample(300:600, 200, replace = T), sacc 
                     = rep("f", each = 100), trial.number = rep(1:2, 
each = 100), stringsAsFactors = F)
example.df$sacc[50:70] <- "s"
example.df$sacc[164:170] <- "s"

我的数据看起来与此相似。我想计算sacc为f的GY的所有其余值的“ s”最后出现之后的GY平均值。在这个示例中，我当然可以将索引号71：100平均，但是在实际数据中并非如此。

我在罗纳克（Ronak）评论后尝试了什么（谢谢！）

library(dplyr)
example.df %>%
   group_by(trial.number) %>%
   summarise(mean_tr = mean(GY[(max(which(sacc == "s")) + 1) : n()])) 
%>%
   data.frame()

我无法正常工作。有人可以帮我吗？我原来的data.frame是70k行，其中包含许多变量。类= data.frame。

Answer 1

更新

由于我们需要按组进行操作，因此可以在split上trial.number进行操作，然后将相同的操作应用于每个组。

sapply(split(example.df, example.df$trial.number), function(x)
         mean(x$GY[(max(which(x$sacc == "s")) + 1) : nrow(x)]))

#   1        2 
#446.2333 471.7000

使用dplyr可以实现同样的效果

library(dplyr)
example.df %>%
   group_by(trial.number) %>%
   summarise(mean_tr = mean(GY[(max(which(sacc == "s")) + 1) : n()])) %>%
   data.frame()

# trial.number  mean_tr
#1            1 446.2333
#2            2 471.7000

再次确认，

mean(example.df$GY[71:100])
#[1] 446.2333

mean(example.df$GY[171:200])
#[1] 471.7

原始答案

我们可以做到

mean(example.df$GY[(max(which(example.df$sacc == "s")) + 1) : nrow(example.df)])
#[1] 443.6667

在这里，我们首先获取sacc为“ s”的所有索引，然后取其中的max来获取最后的索引。我们从该索引到数据帧（GY）的末尾得到nrow(example.df)个值的平均值。

要确认，

mean(example.df$GY[71:100])
#[1] 443.6667

计算上次指定列规格后的平均值

1 个答案: