Question

我想在添加更多观察时找出稳定的群体平均值。

假设我有以下数据：

             email score
             <chr> <int>
 1 abc@example.com     4
 2 abc@example.com     3
 3 abc@example.com     3
 4 abc@example.com     4
 5 xyz@example.com     1
 6 xyz@example.com     4
 7 xyz@example.com     5
 8 xyz@example.com     5

然后，对于两个不同的组（abc @ example.com，xyz @ example.com），我想计算平均值和＆amp; sd逐行，每行添加一行。因此，对于第2行，它应该是：mean(4,3), sd(4,3) - 对于第3行：mean(4,3,3), sd(4,3,3)等等......

此示例的所需输出将是：

            email score     mean        sd
            <chr> <int>    <dbl>     <dbl>
1 abc@example.com     4 4.000000        NA
2 abc@example.com     3 3.500000 0.7071068
3 abc@example.com     3 3.333333 0.5773503
4 abc@example.com     4 3.500000 0.5773503
5 xyz@example.com     1 1.000000        NA
6 xyz@example.com     4 2.500000 2.1213203
7 xyz@example.com     5 3.333333 2.0816660
8 xyz@example.com     5 3.750000 1.8929694

我如何实现这是R？感谢

Answer 1

这可能适合你

您的数据

df <- read.table(text="email score
 1 abc@example.com     4
 2 abc@example.com     3
 3 abc@example.com     3
 4 abc@example.com     4
 5 xyz@example.com     1
 6 xyz@example.com     4
 7 xyz@example.com     5
 8 xyz@example.com     5", header=TRUE)

解决方案

library(tidyverse)
df %>%
  group_by(email) %>%
  nest(score) %>%
  mutate(data = map(data, ~map_df(seq_len(nrow(.x)), function(i) tibble(mean = mean(.x$score[1:i]), sd = sd(.x$score[1:i]))))) %>%
  unnest(data)

输出

# A tibble: 8 x 3
            # email     mean        sd
           # <fctr>    <dbl>     <dbl>
# 1 abc@example.com 4.000000        NA
# 2 abc@example.com 3.500000 0.7071068
# 3 abc@example.com 3.333333 0.5773503
# 4 abc@example.com 3.500000 0.5773503
# 5 xyz@example.com 1.000000        NA
# 6 xyz@example.com 2.500000 2.1213203
# 7 xyz@example.com 3.333333 2.0816660
# 8 xyz@example.com 3.750000 1.8929694

Answer 2

如果这些是有序观察，则通过组变量rep（）然后聚合它。如果你有适当的代表，那会更容易，但我会尝试使用你的例子：

OR

这不是完美的，也不是特别精益，但逻辑应该可以帮助你解决问题。

R：如何计算组内的均值/标准差，始终逐行添加一个

2 个答案: