我的数据框ModelDF
包含数字和字符值的列,如:
Quantity Type Mode Company
1 Shoe hello Nike
1 Shoe hello Nike
2 Jeans hello Levis
3 Shoe hello Nike
1 Jeans hello Levis
1 Shoe hello Adidas
2 Jeans hello Spykar
1 Shoe ahola Nike
1 Jeans ahola Levis
我必须以这种形式汇总
Quantity Type Mode Company
5 Shoe hello Nike
3 jeans hello Levis
1 Shoe hello adidas
2 jeans hello Spykar
1 Shoe ahola Nike
1 jeans ahola Levis
即。如果所有其他列都相同,我必须汇总和总和数量。
我已尝试使用aggregate
,但由于它不会处理字符值,因此会给我错误的结果。
我有什么选择? 感谢
答案 0 :(得分:0)
aggregate(Quantity ~ Type + Mode + Company, df, sum)
# Type Mode Company Quantity
#1 Shoe hello Adidas 1
#2 Jeans ahola Levis 1
#3 Jeans hello Levis 3
#4 Shoe ahola Nike 1
#5 Shoe hello Nike 5
#6 Jeans hello Spykar 2
您还可以尝试data.table
选项:
setDT(df)[, .(Sum.Quantity = sum(Quantity)), by = list(Type, Mode, Company)]
# Type Mode Company Sum.Quantity
#1: Shoe hello Nike 5
#2: Jeans hello Levis 3
#3: Shoe hello Adidas 1
#4: Jeans hello Spykar 2
#5: Shoe ahola Nike 1
#6: Jeans ahola Levis 1
与dplyr
df %>%
group_by(Type, Mode, Company) %>%
summarise(sum(Quantity))
数据强>
dput(df)
structure(list(Quantity = c(1L, 1L, 2L, 3L, 1L, 1L, 2L, 1L, 1L
), Type = structure(c(2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("Jeans",
"Shoe"), class = "factor"), Mode = structure(c(2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L), .Label = c("ahola", "hello"), class = "factor"),
Company = structure(c(3L, 3L, 2L, 3L, 2L, 1L, 4L, 3L, 2L), .Label = c("Adidas",
"Levis", "Nike", "Spykar"), class = "factor")), .Names = c("Quantity",
"Type", "Mode", "Company"), class = "data.frame", row.names = c(NA,
-9L))
答案 1 :(得分:0)
您不希望“聚合字符串”,您希望通过“字符串变量”聚合数字。这里:
R> xx = data.frame(a=sample(letters[1:3], 10, TRUE),
b=sample(LETTERS[1:3], 10, TRUE),
c=runif(10))
R> xx
a b c
1 b C 0.7094221
2 c B 0.2718095
3 c B 0.8844701
4 b C 0.9270141
5 b C 0.8243021
6 a A 0.3649902
7 a B 0.9763228
8 a A 0.8904676
9 b C 0.8640352
10 a A 0.7931683
R> aggregate(c ~ a + b, data=xx, FUN=sum)
a b c
1 a A 2.0486261
2 a B 0.9763228
3 c B 1.1562796
4 b C 3.3247736