数据集1:
ID Name Territory Sales
1 Richard NY 59
8 Sam California 44
数据集2:
Terr ID Name Comments
LA 5 Rick yes
MH 11 Oly no
我希望最终数据集仅包含第一个数据集的列,并且标识Territory
与Terr
相同,并且不会提前Comments
列。
最终数据应如下所示:
ID Name Territory Sales
1 Richard NY 59
8 Sam California 44
5 Rick LA NA
11 Oly MH NA
提前致谢
答案 0 :(得分:0)
可能的解决方案:
# create a named vector with names from 'set2'
# with the positions of the matching columns in 'set1'
nms2 <- sort(unlist(sapply(names(set2), agrep, x = names(set1))))
# only keep the columns in 'set2' for which a match is found
# and give them the same names as in 'set1'
set2 <- setNames(set2[names(nms2)], names(set1[nms2]))
# bind the two dataset together
# option 1:
library(dplyr)
bind_rows(set1, set2)
# option 2:
library(data.table)
rbindlist(list(set1, set2), fill = TRUE)
给出(dplyr
- 输出显示):
ID Name Territory Sales 1 1 Richard NY 59 2 8 Sam California 44 3 5 Rick LA NA 4 11 Oly MH NA
使用过的数据:
set1 <- structure(list(ID = c(1L, 8L),
Name = c("Richard", "Sam"),
Territory = c("NY", "California"),
Sales = c(59L, 44L)),
.Names = c("ID", "Name", "Territory", "Sales"), class = "data.frame", row.names = c(NA, -2L))
set2 <- structure(list(Terr = c("LA", "MH"),
ID = c(5L, 11L),
Name = c("Rick", "Oly"),
Comments = c("yes", "no")),
.Names = c("Terr", "ID", "Name", "Comments"), class = "data.frame", row.names = c(NA, -2L))