如何将2个数据集一个接一个地添加到另一个之下,具有略微不同的列名称?

时间:2018-05-08 10:53:38

标签: r

数据集1:

ID Name     Territory   Sales
1  Richard  NY            59
8  Sam      California    44

数据集2:

Terr ID  Name   Comments
 LA   5   Rick    yes
 MH   11  Oly     no

我希望最终数据集仅包含第一个数据集的列,并且标识TerritoryTerr相同,并且不会提前Comments列。

最终数据应如下所示:

ID Name     Territory  Sales
1  Richard  NY           59
8  Sam      California   44
5  Rick     LA           NA
11 Oly      MH           NA

提前致谢

1 个答案:

答案 0 :(得分:0)

可能的解决方案:

# create a named vector with names from 'set2' 
# with the positions of the matching columns in 'set1'
nms2 <- sort(unlist(sapply(names(set2), agrep, x = names(set1))))

# only keep the columns in 'set2' for which a match is found
# and give them the same names as in 'set1'
set2 <- setNames(set2[names(nms2)], names(set1[nms2]))

# bind the two dataset together

# option 1:
library(dplyr)
bind_rows(set1, set2)

# option 2:
library(data.table)
rbindlist(list(set1, set2), fill = TRUE)

给出(dplyr - 输出显示):

  ID    Name  Territory Sales
1  1 Richard         NY    59
2  8     Sam California    44
3  5    Rick         LA    NA
4 11     Oly         MH    NA

使用过的数据:

set1 <- structure(list(ID = c(1L, 8L), 
                       Name = c("Richard", "Sam"),
                       Territory = c("NY", "California"),
                       Sales = c(59L, 44L)),
                  .Names = c("ID", "Name", "Territory", "Sales"), class = "data.frame", row.names = c(NA, -2L))
set2 <- structure(list(Terr = c("LA", "MH"),
                       ID = c(5L, 11L),
                       Name = c("Rick", "Oly"),
                       Comments = c("yes", "no")),
                  .Names = c("Terr", "ID", "Name", "Comments"), class = "data.frame", row.names = c(NA, -2L))
相关问题