我的数据如下所示
df<- structure(list(data1 = c(20171205L, 20171205L, 20171205L, 20171205L,
20171205L, 20171205L, 20171205L, 20171205L, 20171205L, 20171205L,
20171205L, 20171205L, 20171205L, 20171205L, 20171205L, 20171205L,
20171205L, 20171205L, 20171205L, 20171205L), data2 = c(0.00546273,
0.00552377, 0.00549325, 0.00550851, 0.00556954, 0.00560006, 0.00555428,
0.00560006, 0.0055848, 0.00561532, 0.00555428, 0.0055848, 0.00552377,
0.00549325, 0.00550851, 0.00556954, 0.00560006, 0.00555428, 0.00560006,
0.0055848), data3 = c(0.00546273, 0.00552377, 0.00549325, 0.00550851,
0.00556954, 0.00560006, 0.00555428, 0.00560006, 0.0055848, 0.00561532,
0.00555428, 0.0055848, 0.00552377, 0.00549325, 0.00550851, 0.00556954,
0.00560006, 0.00555428, 0.00560006, 0.0055848), mydf = structure(1:20, .Label = c("B02",
"B03", "B04", "B05", "B06", "C02", "C03", "C04", "C05", "C06",
"D02", "D03", "D04", "D05", "D06", "E02", "E03", "E04", "E05",
"E06"), class = "factor")), .Names = c("data1", "data2", "data3",
"mydf"), class = "data.frame", row.names = c(NA, -20L))
我试图获得某些行的平均值和标准差 这就是我的工作
# here is to get the mean
library(dplyr)
df2 <- df %>%
group_by(Group = case_when(
grepl("02$|03$", mydf) ~ 1L,
grepl("04$|05$|06$", mydf) ~ 2L,
TRUE ~ NA_integer_
)) %>%
summarise_at(vars(-mydf), funs(mean(.)))
# here is to get the standard deviation
df3 <- df %>%
group_by(Group = case_when(
grepl("02$|03$", mydf) ~ 1L,
grepl("04$|05$|06$", mydf) ~ 2L,
TRUE ~ NA_integer_
)) %>%
summarise_at(vars(-mydf), funs(sd(.)))
然后我想把它们和他们的sd放在一起,但我不知道如何将这两个df合并在一起并绘制它们
第一列是x轴(在这种情况下是1和2)
因此情节的数据看起来像这样(例如):
data1
为1,20171205
为SD,0
,
{2}代表data1
,20171205
的SD为0
,
data2
为1,0.005556190
为SD,4.573063e-05
,
{2}代表data2
,0.005553013
标识为4.529097e-05
,等等。
答案 0 :(得分:1)
以长格式而不是宽格式进行数据处理可能更清晰:
dff <- df %>%
# define Group based on mydf, then remove mydf
mutate(Group = case_when(grepl("02$|03$", mydf) ~ 1L,
grepl("04$|05$|06$", mydf) ~ 2L,
TRUE ~ NA_integer_)) %>%
select(-mydf) %>%
# convert to long format using gather from tidyr package
tidyr::gather(data, value, -Group) %>%
# calculate mean & sd within the same summarise() call
group_by(Group, data) %>%
summarise(data.mean = mean(value),
data.sd = sd(value))
> dff
# A tibble: 6 x 4
# Groups: Group [2]
Group data data.mean data.sd
<int> <chr> <dbl> <dbl>
1 1 data1 20171205 0
2 1 data2 0.00556 0.0000457
3 1 data3 0.00556 0.0000457
4 2 data1 20171205 0
5 2 data2 0.00555 0.0000453
6 2 data3 0.00555 0.0000453
鉴于data1和amp;之间价值的巨大差异data2 / data3,实际上没有办法在同一个图表中绘制所有内容。仍然看到与后者相关的微小标准偏差。但原则上,您可以执行以下操作(使用点表示每个数据的平均值,并使用线条范围或误差线表示每个均值的k个标准偏差):
library(ggplot2)
ggplot(dff %>% filter(data != "data1"),
aes(x = data, y = data.mean,color = data,
ymin = data.mean - data.sd,
ymax = data.mean + data.sd)) +
geom_point() +
geom_linerange() +
facet_grid(~Group)