R sort按组总和汇总ddply

时间:2015-04-08 17:52:19

标签: r pivot-table plyr

我有像这样的data.frame

x <- data.frame(Category=factor(c("One", "One", "Four", "Two","Two",
"Three", "Two", "Four","Three")),
City=factor(c("D","A","B","B","A","D","A","C","C")),
Frequency=c(10,1,5,2,14,8,20,3,5))

  Category City Frequency
1      One    D        10
2      One    A         1
3     Four    B         5
4      Two    B         2
5      Two    A        14
6    Three    D         8
7      Two    A        20
8     Four    C         3
9    Three    C         5

我想用sum(频率)创建一个数据透视表,并使用ddply函数,如下所示:

ddply(x,.(Category,City),summarize,Total=sum(Frequency))
  Category City Total
1     Four    B     5
2     Four    C     3
3      One    A     1
4      One    D    10
5    Three    C     5
6    Three    D     8
7      Two    A    34
8      Two    B     2

但我需要按每个类别组中的总数排序此结果。像这样:

Category City Frequency
1      Two    A        34
2      Two    B         2
3    Three    D        14
4    Three    C         5
5      One    D        10
6      One    A         1
7     Four    B         5
8     Four    C         3

我看过并尝试过排序,排序,安排,但似乎没有什么能做我需要的。我怎么能在R中这样做?

2 个答案:

答案 0 :(得分:5)

这是一个很好的问题,我无法想到这样做的直接方式,而不是创建总大小索引然后按它排序。这是一种可能的data.table方法,该方法使用setorder函数,该方法将按引用对

进行排序
library(data.table)
Res <- setDT(x)[, .(Total = sum(Frequency)), by = .(Category, City)]
setorder(Res[, size := sum(Total), by = Category], -size, -Total, Category)[]
#    Category City Total size
# 1:      Two    A    34   36
# 2:      Two    B     2   36
# 3:    Three    D     8   13
# 4:    Three    C     5   13
# 5:      One    D    10   11
# 6:      One    A     1   11
# 7:     Four    B     5    8
# 8:     Four    C     3    8

或者,如果您深入Hdleyverse,我们可以使用较新的dplyr包(根据@akrun建议)获得类似的结果

library(dplyr)
x %>% 
  group_by(Category, City) %>% 
  summarise(Total = sum(Frequency)) %>% 
  mutate(size= sum(Total)) %>% 
  ungroup %>%
  arrange(-size, -Total, Category)

答案 1 :(得分:4)

以下是基本R版本,其中DF是您ddply电话的结果:

with(DF, DF[order(-ave(Total, Category, FUN=sum), Category, -Total), ])

产生

  Category City Total
7      Two    A    34
8      Two    B     2
6    Three    D     8
5    Three    C     5
4      One    D    10
3      One    A     1
1     Four    B     5
2     Four    C     3

逻辑与David的基本相同,为每个Total计算Category的总和,对每个Category中的所有行使用该数字(我们这样做)与ave(..., FUN=sum)),然后再加上一些断路器,以确保按预期发布。