R:聚合字符串

时间:2016-05-12 07:47:23

标签: r aggregate

我的数据框ModelDF包含数字和字符值的列,如:

Quantity        Type        Mode        Company
   1            Shoe        hello        Nike
   1            Shoe        hello        Nike
   2            Jeans       hello        Levis
   3            Shoe        hello        Nike
   1            Jeans       hello        Levis
   1            Shoe        hello        Adidas
   2            Jeans       hello        Spykar
   1            Shoe        ahola        Nike
   1            Jeans       ahola        Levis

我必须以这种形式汇总

Quantity        Type        Mode        Company
   5            Shoe        hello        Nike
   3            jeans       hello        Levis
   1            Shoe        hello        adidas
   2            jeans       hello        Spykar
   1            Shoe        ahola        Nike
   1            jeans       ahola        Levis

即。如果所有其他列都相同,我必须汇总和总和数量。

我已尝试使用aggregate,但由于它不会处理字符值,因此会给我错误的结果。

我有什么选择? 感谢

2 个答案:

答案 0 :(得分:0)

aggregate(Quantity ~ Type + Mode + Company, df, sum)
#   Type  Mode Company Quantity
#1  Shoe hello  Adidas        1
#2 Jeans ahola   Levis        1
#3 Jeans hello   Levis        3
#4  Shoe ahola    Nike        1
#5  Shoe hello    Nike        5
#6 Jeans hello  Spykar        2

您还可以尝试data.table选项:

setDT(df)[, .(Sum.Quantity = sum(Quantity)), by = list(Type, Mode, Company)]

#    Type  Mode Company Sum.Quantity
#1:  Shoe hello    Nike            5
#2: Jeans hello   Levis            3
#3:  Shoe hello  Adidas            1
#4: Jeans hello  Spykar            2
#5:  Shoe ahola    Nike            1
#6: Jeans ahola   Levis            1

dplyr

类似
df %>% 
  group_by(Type, Mode, Company) %>% 
               summarise(sum(Quantity))

数据

dput(df)
structure(list(Quantity = c(1L, 1L, 2L, 3L, 1L, 1L, 2L, 1L, 1L
), Type = structure(c(2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("Jeans", 
"Shoe"), class = "factor"), Mode = structure(c(2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L), .Label = c("ahola", "hello"), class = "factor"), 
    Company = structure(c(3L, 3L, 2L, 3L, 2L, 1L, 4L, 3L, 2L), .Label = c("Adidas", 
    "Levis", "Nike", "Spykar"), class = "factor")), .Names = c("Quantity", 
"Type", "Mode", "Company"), class = "data.frame", row.names = c(NA, 
-9L))

答案 1 :(得分:0)

您不希望“聚合字符串”,您希望通过“字符串变量”聚合数字。这里:

R> xx = data.frame(a=sample(letters[1:3], 10, TRUE),
                   b=sample(LETTERS[1:3], 10, TRUE),
                   c=runif(10))
R> xx
a b         c
1  b C 0.7094221
2  c B 0.2718095
3  c B 0.8844701
4  b C 0.9270141
5  b C 0.8243021
6  a A 0.3649902
7  a B 0.9763228
8  a A 0.8904676
9  b C 0.8640352
10 a A 0.7931683
R> aggregate(c ~ a + b, data=xx, FUN=sum)
a b         c
1 a A 2.0486261
2 a B 0.9763228
3 c B 1.1562796
4 b C 3.3247736