Question

我试图在ggplot2中的两个单独的堆叠条形图（相同的图形）之间绘制线条，以显示第二个条形图的两个部分是第一个条形图的子集。

我尝试了geom_line和geom_segment。但是，我在同一个图中为每个geom（需要两行）指定一个单独的开始和停止时遇到了同样的问题，这个数据框有五行。

没有线条的情节的示例代码：

library(data.table)
Example <- data.table(X_Axis = c('Count', 'Count', 'Dollars', 'Dollars', 'Dollars'),
                  Stack_Group = c('Purely A', 'A & B', 'Purely A Dollars', 'B Mixed Dollars', 'A Mixed dollars'),
                  Value = c(10,3, 120000, 100000, 50000))
Example[, Percent := Value/sum(Value), by = X_Axis]


ggplot(Example, aes(x = X_Axis, y = Percent, fill = factor(Stack_Group))) +
  geom_bar(stat = 'identity', width = 0.5) + 
  scale_y_continuous(labels = scales::percent)

结束情节的目标：

Answer 1

您可以从绘图对象中获取此数据，而不是对段的开始和结束位置进行硬编码。这里有一个替代方案，您可以在其中提供x类别和条形元素的名称，在这些元素之间应绘制线条。

将绘图分配给变量：

p <- ggplot() +
  geom_bar(data = Example,
           aes(x = X_Axis, y = Percent, fill = Stack_Group), stat = 'identity', width = 0.5)

从绘图对象（ggplot_build）中抓取数据。转换为data.table（setDT）：

d <- ggplot_build(p)$data[[1]]
setDT(d)

在绘图对象的数据中，'x'和'group'变量不是由它们的名称明确给出，而是作为数字给出。由于分类变量是按ggplot的字典顺序排序的，因此我们可以在每个'x'中按rank的名称匹配数字：

d[ , r := rank(group), by = x]

Example[ , x := .GRP, by = X_Axis]
Example[ , r := rank(Stack_Group), by = x]

加入以从原始数据添加'X_Axis'和'Stack_Group'的名称到绘图数据：

d <- d[Example[ , .(X_Axis, Stack_Group, x, r)], on = .(x, r)]

设置应在其中绘制线条的x类别和条形元素的名称：

x_start_nm <- "Count"
x_end_nm <- "Dollars"

e_start <- "A & B"
e_upper <- "A Mixed dollars"
e_lower <- "B Mixed Dollars"

选择绘图对象的相关部分以创建线的开始/结束数据：

d2 <- data.table(x_start = rep(d[X_Axis == x_start_nm & Stack_Group == e_start, xmax], 2),
                 y_start = d[X_Axis == x_start_nm & Stack_Group == e_start, c(ymax, ymin)],
                 x_end = rep(d[X_Axis == x_end_nm & Stack_Group == e_upper, xmin], 2),
                 y_end = c(d[X_Axis == x_end_nm & Stack_Group == e_upper, ymax],
                           d[X_Axis == x_end_nm & Stack_Group == e_lower, ymin]))

将线段添加到原始图：

p + 
  geom_segment(data = d2, aes(x = x_start, xend = x_end, y = y_start, yend = y_end))

Answer 2

这是另一种灵活而直接的方法，有点类似于@Henrik的答案，但仅与用户数据有关。无需从ggplot_build()对象中提取数据。

准备数据

代码：

library(data.table)
library(forcats)

Example <- data.table(
  X_Axis = fct_inorder(c("Count", "Count", "Dollars", "Dollars", "Dollars")),
  Stack_Group = fct_rev(fct_inorder(c("Purely A", "A & B", "Purely A Dollars", 
                                      "B Mixed Dollars", "A Mixed dollars"))),
  Value = c(10, 3, 120000, 100000, 50000),
  Grp2 = fct_inorder(c("Purely", "Mixed", "Purely", "Mixed", "Mixed"))
  )
Example[, Percent := Value/sum(Value), by = X_Axis]
Example[order(Grp2, -Stack_Group), Cumulated := cumsum(Percent), by = X_Axis]

准备好的数据：

Example
#    X_Axis      Stack_Group  Value   Grp2   Percent Cumulated
#1:   Count         Purely A     10 Purely 0.7692308 0.7692308
#2:   Count            A & B      3  Mixed 0.2307692 1.0000000
#3: Dollars Purely A Dollars 120000 Purely 0.4444444 0.4444444
#4: Dollars  B Mixed Dollars 100000  Mixed 0.3703704 0.8148148
#5: Dollars  A Mixed dollars  50000  Mixed 0.1851852 1.0000000

绘图

代码：

library(ggplot2)
w = 0.4   # width of bars
ggplot(Example, aes(x = X_Axis, y = Percent, fill = Stack_Group)) +
  geom_col(width = w) +
  geom_line(aes(x = (1 - w) * as.numeric(X_Axis) + 1.5 * w, y = Top, group = Grp2), 
            data = Example[, .(Top = max(Cumulated)), by = .(X_Axis, Grp2)],
            inherit.aes = FALSE) +
  scale_y_continuous(labels = scales::percent)

图表：

说明

ggplot隐含地将character变量强制转换为factor，这些变量控制着项目的绘制顺序。默认情况下，因子中的级别顺序按字母顺序排列。但在这里我们确实需要明确地控制绘图顺序。因此，我们借助Hadley的方便forcats包创建具有指定级别的因子。
Stack_Group中的级别顺序与ggplot2（版本2.2.0+）的顺序相反，是堆叠值（请参阅?position_stack）。< / p>
数据包括两种类型的组：
- 一个是X_Axis区分"Count"和"Dollars"。
- 另一个隐藏在Stack_Group中，数据项的名称以及OP想要绘制线段的方式。在这里，我们明确定义了一个新变量Grp2，它区分每个栏底部的"Purely"和每个栏顶部的"Mixed"。这样可以避免对线段的起点和终点进行硬编码，从而使该解决方案更加灵活。
计算每个柱的累积百分比。稍后需要这些来绘制线段。
条形图的宽度在变量w中定义，并传递给width的{{1}}参数。
在geom_col()版本2.2.0中引入，ggplot2是geom_col()的快捷方式。
由于只有两个条形，geom_bar(stat = "identity")用于在它们之间绘制线段。
- 在x轴上，线段的范围从 x = 1 + w / 2 到 x = 2 - w / 2 。在这里，我们使用geom_lines()使用因子级别的整数来绘制的事实。因此，在 x = 1 上绘制ggplot，在 x = 2 上绘制"Count"。（这就是明确定义因子水平的原因。）
- 每个条形的y值取自"Dollar"计算的每个Top累积百分比的最大值Grp2。这允许修改每个Example[, .(Top = max(Cumulated)), by = .(X_Axis, Grp2)]中的数据项的名称和顺序。
- 需要使用参数Grp2来阻止inherit.aes = FALSE期待ggplot美学的价值。

增强

如果需要，可以使用不同的线型轻松显示fill：

Grp2

现在，w = 0.2 # width of bars ggplot(Example, aes(x = X_Axis, y = Percent, fill = Stack_Group)) + geom_col(width = w) + geom_line(aes(x = (1 - w) * as.numeric(X_Axis) + 1.5 * w, y = Top, group = Grp2, linetype = fct_rev(Grp2)), data = Example[, .(Top = max(Cumulated)), by = .(X_Axis, Grp2)], inherit.aes = FALSE) + scale_y_continuous(labels = scales::percent) + labs(linetype = "Purely vs Mixed")的因子显示在图例中。使用Grp 2可以方便地重命名图例中的标题。 labs()中的因子顺序已反转为实线为100％并显示图例中的因子，因为它们堆叠在图表中（Grp2位于底部，{{1} } 以上）。

请注意，为了演示目的，还更改了宽度参数"Purely"。

Answer 3

你可以这样做：

library(data.table)
library(ggplot2)
Example <- data.table(X_Axis = c('Count', 'Count', 'Dollars', 'Dollars', 'Dollars'),
                      Stack_Group = c('Purely A', 'A & B', 'Purely A Dollars', 'B Mixed Dollars', 'A Mixed dollars'),
                      Value = c(10,3, 120000, 100000, 50000))
Example[, Percent := Value/sum(Value), by = X_Axis]

ggplot(Example) +
  geom_segment(data=data.frame(x=c("Count","Count"),
                               xend=c("Dollars","Dollars"),
                               y=c(1,0.94),
                               yend=c(1,0.27)),aes(x=x,y=y,xend=xend,yend=yend))+
  geom_bar(aes(x = X_Axis, y = Percent, fill=factor(Stack_Group)),stat='identity', width = .5) + 
  scale_y_continuous(labels = scales::percent)

给出：

NB：因为x轴是分类的，所以我们遇到了从这一点开始而不是从条形本身的边界开始的问题。这就是我绘制geom_segment然后geom_bar以便后者超过第一个的原因这里的值是手动设置的，但是使用三角法和宽度可以计算出具有所需外观所需的偏移值。

在堆积条形图中的不同元素之间绘制线条

3 个答案:

准备数据

绘图

说明

增强