使用分段分布叠加整体分布图

时间:2017-10-17 18:00:06

标签: r ggplot2

library(ggplot2)
library(data.table)

age = sample(1:100,100,T)
segment = sample(1:5,100,T)

data = data.frame(age,segment)

setDT(data)[age > 0 & age < 20, agegroup := "0-19"]

data[age >19 & age <40, agegroup := "20-39"]
data[age >39 & age <60, agegroup := "40-59"]
data[age >59, agegroup := "60+"]

我希望整体代表年龄分布并明确分段。

enter image description here

虚线是整体分布和每个分段,以便我可以比较分段和整体分布的方式。如何在一个图表中叠加两个图?

2 个答案:

答案 0 :(得分:4)

您必须为整个分布创建其他数据集,合并它们并使用不同的geom_bar绘图。

library(ggplot2) 
library(data.table)

# Using OPs data
data <- data[, .N, .(segment, agegroup)]
data2 <- data[, sum(N), .(agegroup)]
data3 <- merge(data, data2)


data3 <- merge(data3, data3[, .(MAX = max(N)), segment], "segment")

ggplot(data3, aes(agegroup)) +
    geom_bar(aes(y = V1),
             stat = "identity", position = "dodge",
             color = "black", fill = "white",
             linetype = 2) + 
    geom_bar(aes(y = N, fill = N == MAX),
             stat = "identity", position = "dodge",
             width = 0.6, color = "black") +
    facet_wrap(~ segment) +
    labs(x = "Age group",
         y = "Number of observations") +
    theme_bw() +
    scale_fill_manual(values = c("grey", "grey5")) +
    theme(legend.position = "none")

enter image description here

答案 1 :(得分:2)

您可以像这样使用ggplot

windows()
ggplot() +
geom_bar(data=data, aes(x=agegroup, y=age), stat = "identity", fill = "red") + # Overall plot
geom_bar(data=data[segment == 2,], aes(x=agegroup, y=age), stat = "identity", , fill = "blue") # segment 1