使用分组计算累积值

时间:2021-03-22 13:16:52

标签: r tidyverse

我正在尝试计算三个时间点不同土壤孵化的累积 acetoneacetaldehyde 排放。 compounds 的发射是从三个 soils 上的六个 soil_types(不同的 days)测得的。我想计算每个时间点每个土壤的累积排放量。

最终目标是计算所有土壤的平均排放量并呈现与此相似的图表(除了我的图表上应该有误差条):

enter image description here

谁能发现我哪里出错了?

代码如下:


library(tidyverse)
library(plotrix)

  df%>%
  group_by(soil, compound, days)%>%
  mutate(cum_emission=cumsum(emission))%>%
  summarise(mean=mean(cum_emission, na.rm = TRUE),
            sd = sd(cum_emission, na.rm = TRUE),
            se = std.error(cum_emission, na.rm = TRUE))

数据如下:

df <- structure(list(days = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 
4, 4, 4, 4, 4, 4, 4, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 4, 4, 4, 4), soil = c(12, 12, 2, 2, 1, 1, 9, 9, 13, 13, 
3, 3, 12, 12, 2, 2, 1, 1, 9, 9, 12, 12, 2, 2, 1, 1, 9, 9, 13, 
13, 3, 3, 13, 13, 3, 3), soil_type = c("organic", "organic", 
"mineral", "mineral", "mineral", "mineral", "organic", "organic", 
"organic", "organic", "mineral", "mineral", "organic", "organic", 
"mineral", "mineral", "mineral", "mineral", "organic", "organic", 
"organic", "organic", "mineral", "mineral", "mineral", "mineral", 
"organic", "organic", "organic", "organic", "mineral", "mineral", 
"organic", "organic", "mineral", "mineral"), compound = c("Acetone", 
"Acetaldehyde", "Acetone", "Acetaldehyde", "Acetone", "Acetaldehyde", 
"Acetone", "Acetaldehyde", "Acetone", "Acetaldehyde", "Acetone", 
"Acetaldehyde", "Acetone", "Acetaldehyde", "Acetone", "Acetaldehyde", 
"Acetone", "Acetaldehyde", "Acetone", "Acetaldehyde", "Acetone", 
"Acetaldehyde", "Acetone", "Acetaldehyde", "Acetone", "Acetaldehyde", 
"Acetone", "Acetaldehyde", "Acetone", "Acetaldehyde", "Acetone", 
"Acetaldehyde", "Acetone", "Acetaldehyde", "Acetone", "Acetaldehyde"
), emission = c(0.01, 0, 0.03, 0.03, 0.07, 0.06, 0.33, 0.1, 0.02, 
0.01, 0.01, 0, 0.02, 0.01, 0.07, 0.08, 0.09, 0.07, 0.32, 0.22, 
0.01, 0, 0.06, 0.06, 0.08, 0.06, 0.23, 0.14, 0.4, 0.04, 0.14, 
0, 0.05, 0.05, 0.14, 0)), row.names = c(NA, -36L), class = c("tbl_df", 
"tbl", "data.frame"))

1 个答案:

答案 0 :(得分:2)

这仅解决数据的设置,而不是绘图。 (抱歉回答不全!)

您写道,您想按 soil, compound, days 分组,您的意思是 soil_type, compound, days?正如@maarvd 指出的那样,对于土壤,每一行都是独一无二的。

当我将内容修改为

 df %>%
   group_by(soil_type, compound, days)%>%
   mutate(cum_emission=cumsum(emission))%>%
   summarise(mean=mean(cum_emission, na.rm = TRUE),
             sd = sd(cum_emission, na.rm = TRUE),
             se = std.error(cum_emission, na.rm = TRUE))

我能够呈现以下结果

# A tibble: 12 x 6
# Groups:   soil_type, compound [4]
   soil_type compound      days   mean     sd     se
   <chr>     <chr>        <dbl>  <dbl>  <dbl>  <dbl>
 1 mineral   Acetaldehyde     0 0.0700 0.0346 0.02  
 2 mineral   Acetaldehyde     4 0.127  0.0404 0.0233
 3 mineral   Acetaldehyde    10 0.10   0.0346 0.02  
 4 mineral   Acetone          0 0.08   0.0436 0.0252
 5 mineral   Acetone          4 0.177  0.116  0.0669
 6 mineral   Acetone         10 0.16   0.111  0.0643
 7 organic   Acetaldehyde     0 0.07   0.0608 0.0351
 8 organic   Acetaldehyde     4 0.173  0.144  0.0829
 9 organic   Acetaldehyde    10 0.107  0.0945 0.0546
10 organic   Acetone          0 0.237  0.197  0.113 
11 organic   Acetone          4 0.25   0.201  0.116 
12 organic   Acetone         10 0.297  0.319  0.184 

** 根据@Tiptop 的评论进行更改

如果您正在寻找累积的移动平均线,这个怎么样? 我确信其中一些不是我最初写的,但无论它起源于何处,我都多次重新利用它。 您不需要 plotrix,但需要图书馆 tidyquant

library(tidyverse)
library(tidyquant)

UDF_roll <- function(x, na.rm = TRUE) {
  m  <- mean(x, na.rm = na.rm)  # calculate the average (for the rolling average)
  s  <- sd(x, na.rm = na.rm)    # calculate the sd to find the confidence interval
  hi <- m + 2*s                 # CI HI
  lo <- m - 2*s                 # CI Low
  vals <- c(Mean = m, SD = s, HI.95 = hi, LO.95 = lo) 
  return(vals)
}
# loop for each type of compound (I'm assuming that the data you provided is a sample and you have more.)

trends <- vector("list")  # empty list to store the results
cp = unique(df$compound)   # create a list of unique compound names

for(i in 1:length(unique(df$compound))){     # loop through each compound
  trends[[i]] <- df %>% as.data.frame() %>%  # add results to the list
    filter(compound == cp[i]) %>%            # for one compound
    arrange(days) %>% 

 # the rolling functions requires time series with a date; so random dates added as controller
        mutate(time = seq(as.Date("2010/1/1"),  
                          by = "month", 
                          length.out = nrow(.)),
               cum_emission = cumsum(emission)) %>%
        arrange(compound,-days) %>%          # most recent on top for TS
        tq_mutate(select = cum_emission,     # collect mean, sd, error
                  mutate_fun = rollapply, 
                  width = 2,                 # 2: current & previous reading
                  align = "right", 
                  by.column = FALSE,
                  FUN = UDF_roll,            # calls the function UDF
                  na.rm = TRUE) %>% 
        ggplot(aes(x = seq_along(time))) +   
        geom_point(aes(y = cum_emission), 
                   color = "black", alpha = 0.2) +  # cumulative
        geom_ribbon(aes(ymin = LO.95, ymax = HI.95), 
                    fill = "azure3", alpha = 0.4) + # confidence interval
        geom_jitter(aes(y = Mean, color= Mean), 
                        size = 1, alpha = 0.9) +    # rolling average
        labs(title = paste0(cp[[i]], ": Trends and Volatility\nIncremental Moving Average with 95% CI Bands (+/-2 SD)"),
             x = "", y = "Soil Emissions") +
        scale_color_viridis_c(end = .8) + theme_bw() + 
        theme(legend.position="none")
    }
    
    trends[[1]]   
    trends[[2]]   
    trends[[1]]$data    # you can NULL the time column if you use the data another way

这使得数据时间序列。情节:First plot Second Plot

数据如下所示。如果您想对其进行不同的分组,则必须将参数 .groups = "drop" 添加到 summarise() 调用中,否则您将无法通过 tq_mutate 获取它。

# A tibble: 18 x 11
    days  soil soil_type compound emission time       cum_emission   Mean       SD   HI.95   LO.95
   <dbl> <dbl> <chr>     <chr>       <dbl> <date>            <dbl>  <dbl>    <dbl>   <dbl>   <dbl>
 1     0    12 organic   Acetone      0.01 2010-01-01         0.01 NA     NA       NA      NA     
 2     0     2 mineral   Acetone      0.03 2010-02-01         0.04  0.025  0.0212   0.0674 -0.0174
 3     0     1 mineral   Acetone      0.07 2010-03-01         0.11  0.075  0.0495   0.174  -0.0240
 4     0     9 organic   Acetone      0.33 2010-04-01         0.44  0.275  0.233    0.742  -0.192 
 5     0    13 organic   Acetone      0.02 2010-05-01         0.46  0.45   0.0141   0.478   0.422 
 6     0     3 mineral   Acetone      0.01 2010-06-01         0.47  0.465  0.00707  0.479   0.451 
 7     4    12 organic   Acetone      0.02 2010-07-01         0.49  0.48   0.0141   0.508   0.452 
 8     4     2 mineral   Acetone      0.07 2010-08-01         0.56  0.525  0.0495   0.624   0.426 
 9     4     1 mineral   Acetone      0.09 2010-09-01         0.65  0.605  0.0636   0.732   0.478 
10     4     9 organic   Acetone      0.32 2010-10-01         0.97  0.81   0.226    1.26    0.357 
11     4    13 organic   Acetone      0.05 2010-11-01         1.02  0.995  0.0354   1.07    0.924 
12     4     3 mineral   Acetone      0.14 2010-12-01         1.16  1.09   0.0990   1.29    0.892 
13    10    12 organic   Acetone      0.01 2011-01-01         1.17  1.16   0.00707  1.18    1.15  
14    10     2 mineral   Acetone      0.06 2011-02-01         1.23  1.2    0.0424   1.28    1.12  
15    10     1 mineral   Acetone      0.08 2011-03-01         1.31  1.27   0.0566   1.38    1.16  
16    10     9 organic   Acetone      0.23 2011-04-01         1.54  1.42   0.163    1.75    1.10  
17    10    13 organic   Acetone      0.4  2011-05-01         1.94  1.74   0.283    2.31    1.17  
18    10     3 mineral   Acetone      0.14 2011-06-01         2.08  2.01   0.0990   2.21    1.81