Question

有人可以帮我吗？我对多家公司的支出数据进行了分组和汇总，输出结果如下：

df <- data.frame(
    Column1 = c("Other", "Brand1", "Brand2", "Brand3", "Brand4", "Brand5"),
    Column2 = c(NA, "Subbrand1", "Subbrand2", "Subbrand3", "Subbrand4", "Subbrand5"),
    Spendings = c(1000, 500, 250, 200, 150, 100)
)

  Column1   Column2 Spendings
1   Other      <NA>      1000
2  Brand1 Subbrand1       500
3  Brand2 Subbrand2       250
4  Brand3 Subbrand3       200
5  Brand4 Subbrand4       150
6  Brand5 Subbrand5       100

“其他”行位于顶部，但是由于稍后的可视化效果（如此处），我希望该特定列位于底部

df <- data.frame(
    Column1 = c("Brand1", "Brand2", "Brand3", "Brand4", "Brand5", "Other"),
    Column2 = c("Subbrand1", "Subbrand2", "Subbrand3", "Subbrand4", "Subbrand5", NA),
    Spendings = c(500, 250, 200, 150, 100, 1000)
)

  Column1   Column2 Spendings
1  Brand1 Subbrand1       500
2  Brand2 Subbrand2       250
3  Brand3 Subbrand3       200
4  Brand4 Subbrand4       150
5  Brand5 Subbrand5       100
6   Other      <NA>      1000

这是我用来创建带有某些我想要的代码的df的函数，该代码为obv。不起作用：-（。

df <- df%>%
    group_by(Column1, Column2) %>%
    summarise(Spendings = sum(Spendings)) %>%
    arrange(desc(Spendings), lastrow = "others")

是否可以在dplyr工作流程的底部获得“其他”行？子集和绑定当然是可能的，但是有没有一种更合适的方法？

Answer 1

我们可以在arrange上使用逻辑向量，这将导致基于字母顺序的排序，即FALSE在TRUE之前

df %>% 
   arrange(Column1 == "Other")
#  Column1   Column2 Spendings
#1  Brand1 Subbrand1       500
#2  Brand2 Subbrand2       250
#3  Brand3 Subbrand3       200
#4  Brand4 Subbrand4       150
#5  Brand5 Subbrand5       100
#6   Other      <NA>      1000

另一种选择是创建列为factor并按该顺序指定levels的列，这样'Other'是最后一个level，如果我们arrange会是根据{{1}}进行订单。这可能是一个更好的选择，因为在执行levels

时也可以对其进行维护

plot

如果我们使用un1 <- c(setdiff(unique(df$Column1), "Other"), "Other") df %>% mutate(Column1 = factor(Column1, levels = un1)) %>% arrange(Column1)包，则有一些有用的功能forcats可以轻松地修改fct_relevel

levels

根据library(forcats) df %>% mutate(Column1 = fct_relevel(Column1, "Other", after = Inf)) %>% arrange(Column1)

中的示例

使用'Inf'，您可以在数字级别未知或可变（例如矢量化操作）

Answer 2

df <- df%>%
group_by(Column1, Column2) %>%
summarise(Spendings = sum(Spendings)) %>%
arrange(Column1=="Other", desc(Spendings))

使用dplyr进行汇总-一个变量始终位于底部

2 个答案: