合并多列中具有相同值的行

时间:2015-07-09 17:56:38

标签: r

我有一个包含多行和多列(13232行和18列)的Excel文件。最后一列给出了一些价值。我想要做的是 - 找到除最后一列之外具有相同列详细信息的行,并将它们的最后一列值相加。 例如: 如果输入是

+---------+---------+---------+---------+
| Column1 | Column2 | Column3 | Column4 |
+---------+---------+---------+---------+
| ABC     | DEF     | GHI     |       5 |
| XYZ     | PQR     | LMN     |       4 |
| ABC     | DEF     | GHI     |      11 |
| Test1   | Test2   | Test3   |      12 |
| XYZ     | PQR     | LMN     |      54 |
+---------+---------+---------+---------+

然后输出

+---------+---------+---------+---------+
| Column1 | Column2 | Column3 | Column4 |
+---------+---------+---------+---------+
| ABC     | DEF     | GHI     |      16 |
| XYZ     | PQR     | LMN     |      58 |
| Test1   | Test2   | Test3   |      12 |
+---------+---------+---------+---------+

如何在R中实现这一目标?

1 个答案:

答案 0 :(得分:6)

您可以使用aggregate

中的base R
 aggregate(Column4~., df1, FUN=sum)
 #    Column1 Column2 Column3 Column4
 #1     ABC     DEF     GHI      16
 #2     XYZ     PQR     LMN      58
 #3   Test1   Test2   Test3      12

或者

 library(data.table)
 setDT(df1)[, list(Column4=sum(Column4)), by = c(names(df1)[1:3])]
 #     Column1 Column2 Column3 Column4
 #1:     ABC     DEF     GHI      16
 #2:     XYZ     PQR     LMN      58
 #3:   Test1   Test2   Test3      12

或者

 library(sqldf)
 sqldf('select Column1, Column2, Column3,
          sum(Column4) as Column4
          from df1 
          group by Column1, Column2, Column3')
 #   Column1 Column2 Column3 Column4
 #1     ABC     DEF     GHI      16
 #2   Test1   Test2   Test3      12
 #3     XYZ     PQR     LMN      58

或者

library(dplyr)
df1 %>% group_by(Column1, Column2, Column3) %>%
  summarize(Column4 = sum(Column4))
# Source: local data frame [3 x 4]
# Groups: Column1, Column2

#   Column1 Column2 Column3 Column4
# 1     ABC     DEF     GHI      16
# 2   Test1   Test2   Test3      12
# 3     XYZ     PQR     LMN      58

可重复数据:

df1 <-
structure(list(Column1 = structure(c(1L, 3L, 1L, 2L, 3L), .Label = c("ABC", 
"Test1", "XYZ"), class = "factor"), Column2 = structure(c(1L, 
2L, 1L, 3L, 2L), .Label = c("DEF", "PQR", "Test2"), class = "factor"), 
    Column3 = structure(c(1L, 2L, 1L, 3L, 2L), .Label = c("GHI", 
    "LMN", "Test3"), class = "factor"), Column4 = c(5L, 4L, 11L, 
    12L, 54L)), .Names = c("Column1", "Column2", "Column3", "Column4"
), class = "data.frame", row.names = c(NA, -5L))