Question

我有两个数据库-旧数据库和更新数据库。两者具有相同的结构，但具有唯一的ID。如果记录发生更改-有具有相同ID和新数据的新记录。因此，在rbind(m1,m2)之后，我有重复的记录。我不能只删除重复的ID，因为可以更新数据。除了记录在旧文件还是更新文件中之外，无法区分哪个记录是新记录。

如何合并两个表，如果有重复的ID的行，将其中一个保留在较新的文件中？

我知道我可以在这两个列中都添加列，而只需ifelse()，但是我正在寻找更优雅的东西，最好是oneliner。

Answer 1

在没有样本数据的情况下很难给出正确的答案..但这是一种您可以调整数据的方法。

#sample data
library( data.table )
dt1 <- data.table( id = 2:3, value = c(2,4))
dt2 <- data.table( id = 1:2, value = c(2,6))
#dt1
#    id value
# 1:  2     2
# 2:  3     4
#dt2
#    id value
# 1:  1     2
# 2:  2     6

#rowbind...
DT <- rbindlist( list(dt1,dt2), use.names = TRUE )
#    id value
# 1:  2     2
# 2:  3     4
# 3:  1     2
# 4:  2     6

#deselect duplicated id from the buttom up
# assuming the last file in the list contains the updated values
DT[ !duplicated(id, fromLast = TRUE), ]
#    id value
# 1:  3     4
# 2:  1     2
# 3:  2     6

Answer 2

说你有

old <- data.frame(id = c(1,2,3,4,5), val = c(21,22,23,24,25))
new <- data.frame(id = c(1,4), val = c(21,27))

因此ID为4的记录在新数据集中已更改，而1为纯重复。

您可以使用dplyr::anti_join查找不在新数据集中的旧记录，然后仅使用rbind在其上添加新记录。

joined <- rbind(anti_join(old,new, by = "id"),new)

Answer 3

您可以使用dplyr：

df_new %>%
  full_join(df_old, by="id") %>%
  transmute(id = id, value = coalesce(value.x, value.y))

返回

   id      value
1   1 0.03432355
2   2 0.28396359
3   3 0.01121692
4   4 0.57214035
5   5 0.67337745
6   6 0.67637187
7   7 0.69178855
8   8 0.83953140
9   9 0.55350251
10 10 0.27050363
11 11 0.28181032
12 12 0.84292569

给予

df_new <- structure(list(id = 1:10, value = c(0.0343235526233912, 0.283963593421504, 
0.011216921498999, 0.572140350239351, 0.673377452883869, 0.676371874753386, 
0.691788548836485, 0.839531400706619, 0.553502510068938, 0.270503633422777
)), class = "data.frame", row.names = c(NA, -10L))

df_old <- structure(list(id = c(1, 4, 5, 3, 7, 9, 11, 12), value = c(0.111697669373825, 
0.389851713553071, 0.252179590053856, 0.91874519130215, 0.504653975600377, 
0.616259852424264, 0.281810319051147, 0.842925694771111)), class = "data.frame", row.names = c(NA, 
-8L))

R-合并两个数据表并从旧文件中删除重复项？

3 个答案: