使用三个不同的字段匹配两个不同data.frames之间的列

时间:2016-03-18 12:05:02

标签: r reference match

这是我整天早上一直让我发疯的问题。

所以,我有两个表“船只”和“目标”

v_registry<-c("","GBR000B11824","GBR000B10110","GBR000C17779","","GBR000C16255")
   v_pln<-c("WH4","","BRD5","B291","LI8","UL78")
   v_rss<-c("C19926","B11824","","C17779","A16190","C16255")
   v_asset<- c(104892,104902,104905,104916,104919,104920)
   vessel<-data.frame(v_registry,v_pln,v_rss,v_asset,stringsAsFactors=FALSE)

   t_registry<-c("GBR000C19926","GBR000B11824","","","GBR000A16190","")
   t_pln<-c("","","BRD5","B291","LI8","")
   t_rss<-c("C19926","","","","","C16255")
   target<-data.frame(t_registry,t_pln,t_rss,stringsAsFactors=FALSE)


  target<-target[sample(nrow(target)),] 

船只表有关于船只的身份证明信息。目标表非常广泛,示例中不需要大量其他数据。我想要实现的是将“t_asset”列(这是唯一的完整ID字段)复制到目标表。问题是我的表都没有完成,我需要根据三个不同的字段进行操作。

以下是尝试这样做的几次尝试。样品线只是为了洗牌它,因为如果它被订购有一些奇怪的原因它会起作用。第二次尝试只返回一个逻辑值,我没有设法获取元素而不是逻辑值。

 #Attempt 1
 target$t_asset<-
 vessel$v_asset[match(target$t_registry,vessel$v_registry,incomparables = "")|
                match(target$t_pln,vessel$v_pln,incomparables = "")|
                match(target$t_rss,vessel$v_rss,incomparables = "")]  

 #Attempt 2
 target$t_asset<-
 (vessel$v_asset[match(target$t_registry,vessel$v_registry,incomparables = "")]|
  vessel$v_asset[match(target$t_pln,vessel$v_pln,incomparables = "")]|
  vessel$v_asset[match(target$t_rss,vessel$v_rss,incomparables = "")])   

预期的输出是(由于shuffle,行可能看起来不同):

> target
    t_registry t_pln  t_rss t_asset
1 GBR000C19926       C19926  104892
2 GBR000B11824               104902
3               BRD5         104905
4               B291         104916
5 GBR000A16190   LI8         104919
6                    C16255  104920

关于如何解决它的任何想法?

干杯

3 个答案:

答案 0 :(得分:1)

#  Find which rows from vessel are the match for target
x <- mapply( match , MoreArgs=list(incomparables="") , target , vessel )

#  Remove the NA's and incase you have more than one piece of information
#  available (multiple matches), reduce to a single number
idx <- apply(x,1, function(x) unique( x[!is.na(x) ] ))

#  Use the matches to get the id field from vessel
target$t_asset <- vessel$v_asset[idx]
target
#    t_registry t_pln  t_rss t_asset
#3               BRD5         104905
#2 GBR000B11824               104902
#4               B291         104916
#1 GBR000C19926       C19926  104892
#6                    C16255  104920
#5 GBR000A16190   LI8         104919

答案 1 :(得分:1)

使用merge

target$t_asset <- merge(target, vessel, by.x=1:3, by.y=1:3, all.y = T, sort = F)$v_asset

> target
    t_registry t_pln  t_rss t_asset
6                    C16255  104892
1 GBR000C19926       C19926  104902
3               BRD5         104905
2 GBR000B11824               104916
5 GBR000A16190   LI8         104919
4               B291         104920

答案 2 :(得分:0)

两者,早先的答案解决了给出的例子。但是,当出于某种原因应用于实际数据集时,两者都会出错。

所以,最后我还得到了一些代码,以便在真实数据集中提供正确答案并进行测试。但是,代码并不漂亮,我确信它可以提高效率。

# Creates three new columns each with an idependent match
target$t_asset_registry<-vessel$v_asset[match(target$t_registry,vessel$v_registry,incomparables = "")] 
target$t_asset_pln<-vessel$v_asset[match(target$t_pln,vessel$v_pln,incomparables = "")]
target$t_asset_rss<-vessel$v_asset[match(target$t_rss,vessel$v_rss,incomparables = "")]    

# an if statment to sumarize the results  
target$asset<-ifelse(is.na(target$t_asset_registry),
                ifelse(is.na(target$t_asset_pln),
                  ifelse(is.na(target$t_asset_rss),NA,target$t_asset_rss),
                  target$t_asset_pln),target$t_asset_registry) 

输出结果为:

> target
    t_registry t_pln  t_rss t_asset_registry t_asset_pln t_asset_rss  asset
4               B291                      NA      104916          NA 104916
3               BRD5                      NA      104905          NA 104905
6                    C16255               NA          NA      104920 104920
5 GBR000A16190   LI8                      NA      104919          NA 104919
1 GBR000C19926       C19926               NA          NA      104892 104892
2 GBR000B11824                        104902          NA          NA 104902

在输出中看清楚我想要实现的目标。如果有人作为完成相同结果的聪明方法,请发布。

感谢所有帮助