使用ggplot在r中堆积条形图

时间:2021-04-20 07:46:32

标签: r ggplot2

我想分析 2020 年与 2021 年的 Covid 情况,并想使用 ggplot 展示 2021 年病毒的传染性

df <- data.frame(
  self_impact = as.factor(c("Y", "Y", "Y", "N", "N", "Y", "Y", "Y", "Y", "N")),
  impacted_family = c("4", "0", "5", "1", "2", "0", "3", "0", "2", "2"),
  month = c(
    "Jan-21", "Jan-21", "Feb-21", "Jan-21", "Mar-21", "Mar-21", "Apr-21",
    "Oct-20", "Nov-20", "Dec-20"
  )
)
self_impact impacted_family  month
        Y               4    Jan-21
        Y               0    Jan-21
        Y               5    Feb-21
        N               1    Jan-21
        N               2    Mar-21
        Y               0    Mar-21
        Y               3    Apr-21
        Y               0    Oct-20
        Y               2    Nov-20
        N               2    Dec-20

2020 年有 2 个自我影响,而 2021 年有 5 个自我影响。

在 2020 年的这 2 个自我影响中,一个家庭被感染,而在 2021 年,5 个自我影响中有 3 个家庭被感染。

此外,与 2020 年相比,2021 年受影响的家庭成员数量非常高。

我想使用 ggplot 和每年的一些颜色选项在堆积条形图中显示这三个信息。

任何帮助都是有用的,谢谢!

2 个答案:

答案 0 :(得分:0)

最好避免使用 3 维,因为人们往往会在 > 2 维上迷失方向。最好为 self_impact 绘制 2 个图表,每个图表一个。

尽管如此,您可以按年份 + self_impact 总结您的数据框,然后使用 facet_wrap 绘制以展示 3 维,如下所示。

FuncThatReturnsPointerToInts()

答案 1 :(得分:0)

一种方法是使用self_impact这样的颜色的折线图

只需每月绘制数据图表

library(lubridate)
library(tidyverse)


# Graph by month
monthly_summary_data <- df %>%
  mutate(month_formatted = as.Date(paste("01 ", month), format = "%d %b-%y")) %>%
  # getting to char date->year
  # removing since redundant.
  group_by(month_formatted, self_impact) %>%
  summarise(
    impacted_family = sum(as.numeric(impacted_family)),
    self_impact2 = n(),
    .groups = "drop"
  )

# As we can see the data is not very much and the plot at month level is just
# noise
ggplot(data = monthly_summary_data) +
  geom_line(aes(x = month_formatted, y = impacted_family,
    group = self_impact, color = self_impact))

或每年绘制数据以减少噪音但缺乏细节

# Graph by year
year_summary_data <- df %>%
  mutate(year =
      factor(year(as.Date(paste("01 ", month), format = "%d %b-%y")))) %>%
  # getting to char date->year
  # removing since redundant.
  group_by(year, self_impact) %>%
  summarise(
    impacted_family = sum(as.numeric(impacted_family)),
    self_impact2 = n(),
    .groups = "drop"
  )

# With the sample amount of data a year level graph is better
ggplot(data = year_summary_data) +
  geom_line(aes(x = year, y = impacted_family,
    group = self_impact, color = self_impact)) +
  # Set y axis to start from ZERO
  scale_y_continuous(limits = c(0, NA))

使用 cumsum 数字减少月度图表中的噪音并比较年份与 year 变量的线型

year_month_summary <- df %>%
  mutate(date = as.Date(paste("01 ", month), format = "%d %b-%y"),
    year = factor(year(date)),
    month = month(date)) %>%
  # getting to char date->year
  # removing since redundant.
  group_by(year, month, self_impact) %>%
  summarise(
    impacted_family = sum(as.numeric(impacted_family)),
    .groups = "drop") %>%
  group_by(year, self_impact) %>%
  mutate(cum_impacted_family = cumsum(impacted_family))

# Using the cumsum to reduce the noise by month 
# and added the linetype using year variable provide some comparison 
ggplot(data = year_month_summary) +
  geom_line(aes(x = month, y = cum_impacted_family,
    group = paste0(year, self_impact), color = self_impact, linetype = year)) +
  # Set y axis to start from ZERO
  scale_y_continuous(limits = c(0, NA)) +
  scale_x_continuous(breaks = seq(1, 12, by = 1), expand = c(0, 0))

reprex package (v2.0.0) 于 2021 年 4 月 20 日创建

相关问题