我有一个包含多行和多列(13232行和18列)的Excel文件。最后一列给出了一些价值。我想要做的是 - 找到除最后一列之外具有相同列详细信息的行,并将它们的最后一列值相加。 例如: 如果输入是
+---------+---------+---------+---------+
| Column1 | Column2 | Column3 | Column4 |
+---------+---------+---------+---------+
| ABC | DEF | GHI | 5 |
| XYZ | PQR | LMN | 4 |
| ABC | DEF | GHI | 11 |
| Test1 | Test2 | Test3 | 12 |
| XYZ | PQR | LMN | 54 |
+---------+---------+---------+---------+
然后输出
+---------+---------+---------+---------+
| Column1 | Column2 | Column3 | Column4 |
+---------+---------+---------+---------+
| ABC | DEF | GHI | 16 |
| XYZ | PQR | LMN | 58 |
| Test1 | Test2 | Test3 | 12 |
+---------+---------+---------+---------+
如何在R中实现这一目标?
答案 0 :(得分:6)
您可以使用aggregate
base R
aggregate(Column4~., df1, FUN=sum)
# Column1 Column2 Column3 Column4
#1 ABC DEF GHI 16
#2 XYZ PQR LMN 58
#3 Test1 Test2 Test3 12
或者
library(data.table)
setDT(df1)[, list(Column4=sum(Column4)), by = c(names(df1)[1:3])]
# Column1 Column2 Column3 Column4
#1: ABC DEF GHI 16
#2: XYZ PQR LMN 58
#3: Test1 Test2 Test3 12
或者
library(sqldf)
sqldf('select Column1, Column2, Column3,
sum(Column4) as Column4
from df1
group by Column1, Column2, Column3')
# Column1 Column2 Column3 Column4
#1 ABC DEF GHI 16
#2 Test1 Test2 Test3 12
#3 XYZ PQR LMN 58
或者
library(dplyr)
df1 %>% group_by(Column1, Column2, Column3) %>%
summarize(Column4 = sum(Column4))
# Source: local data frame [3 x 4]
# Groups: Column1, Column2
# Column1 Column2 Column3 Column4
# 1 ABC DEF GHI 16
# 2 Test1 Test2 Test3 12
# 3 XYZ PQR LMN 58
可重复数据:
df1 <-
structure(list(Column1 = structure(c(1L, 3L, 1L, 2L, 3L), .Label = c("ABC",
"Test1", "XYZ"), class = "factor"), Column2 = structure(c(1L,
2L, 1L, 3L, 2L), .Label = c("DEF", "PQR", "Test2"), class = "factor"),
Column3 = structure(c(1L, 2L, 1L, 3L, 2L), .Label = c("GHI",
"LMN", "Test3"), class = "factor"), Column4 = c(5L, 4L, 11L,
12L, 54L)), .Names = c("Column1", "Column2", "Column3", "Column4"
), class = "data.frame", row.names = c(NA, -5L))