如何制作标准偏差的图

时间:2017-12-13 18:16:47

标签: r ggplot2

我的数据如下所示

df<- structure(list(data1 = c(20171205L, 20171205L, 20171205L, 20171205L, 
20171205L, 20171205L, 20171205L, 20171205L, 20171205L, 20171205L, 
20171205L, 20171205L, 20171205L, 20171205L, 20171205L, 20171205L, 
20171205L, 20171205L, 20171205L, 20171205L), data2 = c(0.00546273, 
0.00552377, 0.00549325, 0.00550851, 0.00556954, 0.00560006, 0.00555428, 
0.00560006, 0.0055848, 0.00561532, 0.00555428, 0.0055848, 0.00552377, 
0.00549325, 0.00550851, 0.00556954, 0.00560006, 0.00555428, 0.00560006, 
0.0055848), data3 = c(0.00546273, 0.00552377, 0.00549325, 0.00550851, 
0.00556954, 0.00560006, 0.00555428, 0.00560006, 0.0055848, 0.00561532, 
0.00555428, 0.0055848, 0.00552377, 0.00549325, 0.00550851, 0.00556954, 
0.00560006, 0.00555428, 0.00560006, 0.0055848), mydf = structure(1:20, .Label = c("B02", 
"B03", "B04", "B05", "B06", "C02", "C03", "C04", "C05", "C06", 
"D02", "D03", "D04", "D05", "D06", "E02", "E03", "E04", "E05", 
"E06"), class = "factor")), .Names = c("data1", "data2", "data3", 
"mydf"), class = "data.frame", row.names = c(NA, -20L))

我试图获得某些行的平均值和标准差 这就是我的工作

# here is to get the mean 

    library(dplyr)
    df2 <- df %>%
      group_by(Group = case_when(
        grepl("02$|03$", mydf)       ~ 1L,
        grepl("04$|05$|06$", mydf)   ~ 2L,
        TRUE                       ~ NA_integer_
      )) %>%
      summarise_at(vars(-mydf), funs(mean(.)))

# here is to get the standard deviation 
df3 <- df %>%
  group_by(Group = case_when(
    grepl("02$|03$", mydf)       ~ 1L,
    grepl("04$|05$|06$", mydf)   ~ 2L,
    TRUE                       ~ NA_integer_
  )) %>%
  summarise_at(vars(-mydf), funs(sd(.)))

然后我想把它们和他们的sd放在一起,但我不知道如何将这两个df合并在一起并绘制它们

第一列是x轴(在这种情况下是1和2)

因此情节的数据看起来像这样(例如):

data1为1,20171205为SD,0

{2}代表data120171205的SD为0

data2为1,0.005556190为SD,4.573063e-05

{2}代表data20.005553013标识为4.529097e-05,等等。

1 个答案:

答案 0 :(得分:1)

以长格式而不是宽格式进行数据处理可能更清晰:

dff <- df %>%
  # define Group based on mydf, then remove mydf
  mutate(Group = case_when(grepl("02$|03$", mydf) ~ 1L,
                           grepl("04$|05$|06$", mydf) ~ 2L,
                           TRUE ~ NA_integer_)) %>%
  select(-mydf) %>%

  # convert to long format using gather from tidyr package
  tidyr::gather(data, value, -Group) %>%

  # calculate mean & sd within the same summarise() call
  group_by(Group, data) %>%
  summarise(data.mean = mean(value),
            data.sd = sd(value))

> dff
# A tibble: 6 x 4
# Groups: Group [2]
  Group data       data.mean   data.sd
  <int> <chr>          <dbl>     <dbl>
1     1 data1 20171205       0        
2     1 data2        0.00556 0.0000457
3     1 data3        0.00556 0.0000457
4     2 data1 20171205       0        
5     2 data2        0.00555 0.0000453
6     2 data3        0.00555 0.0000453

鉴于data1和amp;之间价值的巨大差异data2 / data3,实际上没有办法在同一个图表中绘制所有内容。仍然看到与后者相关的微小标准偏差。但原则上,您可以执行以下操作(使用点表示每个数据的平均值,并使用线条范围或误差线表示每个均值的k个标准偏差):

library(ggplot2)

ggplot(dff %>% filter(data != "data1"),
       aes(x = data, y = data.mean,color = data,
           ymin = data.mean - data.sd, 
           ymax = data.mean + data.sd)) +
  geom_point() +
  geom_linerange() +
  facet_grid(~Group)

plot