在R中合并两个data.frame,保留所有匹配的行

时间:2013-02-13 22:01:26

标签: r

我正在努力将两个data.frame与一个或另一个df中出现的na值合并。

sampleA <- structure(list(Nom_xp = "A1MRJ", Rep = 1L, GB05 = 102L, GB05.1 = 102L, 
    GB18 = 177L, GB18.1 = 177L, GB06 = 240L, GB06.1 = 240L, GB27 = 169L, 
    GB27.1 = 169L, GB24 = 240L, GB24.1 = 242L, GB28 = NA_integer_, 
    GB28.1 = NA_integer_, GB15 = 142L, GB15.1 = 144L, GB02 = 197L, 
    GB02.1 = 197L, GB10 = 126L, GB10.1 = 134L, GB14 = 181L, GB14.1 = 193L), .Names = c("Nom_xp", 
"Rep", "GB05", "GB05.1", "GB18", "GB18.1", "GB06", "GB06.1", 
"GB27", "GB27.1", "GB24", "GB24.1", "GB28", "GB28.1", "GB15", 
"GB15.1", "GB02", "GB02.1", "GB10", "GB10.1", "GB14", "GB14.1"
), row.names = 32L, class = "data.frame")


sampleB <- structure(list(Nom_xp = "A1MRJ", Rep = 2L, GB05 = NA, GB05.1 = NA, 
    GB18 = 177L, GB18.1 = 177L, GB06 = 240L, GB06.1 = 240L, GB27 = 169L, 
    GB27.1 = 169L, GB24 = 240L, GB24.1 = 242L, GB28 = 390L, GB28.1 = 390L, 
    GB15 = 142L, GB15.1 = 144L, GB02 = 197L, GB02.1 = 197L, GB10 = 126L, 
    GB10.1 = 134L, GB14 = 181L, GB14.1 = 193L), .Names = c("Nom_xp", 
"Rep", "GB05", "GB05.1", "GB18", "GB18.1", "GB06", "GB06.1", 
"GB27", "GB27.1", "GB24", "GB24.1", "GB28", "GB28.1", "GB15", 
"GB15.1", "GB02", "GB02.1", "GB10", "GB10.1", "GB14", "GB14.1"
), row.names = 33L, class = "data.frame")

需要输出,作为data.frame。只有一行用于匹配“Nom_xp”,因此如果值存在于一个或另一个DF中,则NA将被A或B中的值替换。

Nom_xp  GB05  GB05  GB18  GB18  GB06  GB06  GB27  GB27  GB24  GB24  GB28    GB28    GB15  GB15  GB02  GB02  GB10  GB10  GB14  GB14
A1MRJ   102 102 177 177 240 240 169 169 240 242 390 390 142 144 197 197 126 134 181 193

我会这么想的:

output <- merge(A,B,by="Nom_xp",all.x=T,all.y=T)

output <- join(A,B,by="Nom_xp",match="all")

会给我我需要的东西,但到目前为止没有运气......我错过了什么?实际data.frame有多行。

2 个答案:

答案 0 :(得分:1)

你只有一排吗?那么,这还不够吗?您可以在sampleB中获得结果:

sampleB[, is.na(sampleB)] <- sampleA[, is.na(sampleB)]

不,我认为不需要申请,加入和合并。未经测试,但这可行。

sampleB[is.na(sampleB)] <- sampleA[is.na(sampleB)]

答案 1 :(得分:0)

不是很确定你的整个数据集是怎么样的,但我想你可能有几个样本具有相同的“Nom_xp”而不仅仅是2?并且您可能将所有数据都放在大数据帧中?

如果是这样,也许这段代码可能是一个好的开始(也许有人可以提供帮助,重写这个更有效率?)。总之:

sampleA <- structure(list(Nom_xp = "A1MRJ", Rep = 1L, GB05 = 102L, GB05.1 = 102L, 
                          GB18 = 177L, GB18.1 = 177L, GB06 = 240L, GB06.1 = 240L, GB27 = 169L, 
                          GB27.1 = 169L, GB24 = 240L, GB24.1 = 242L, GB28 = NA_integer_, 
                          GB28.1 = NA_integer_, GB15 = 142L, GB15.1 = 144L, GB02 = 197L, 
                          GB02.1 = 197L, GB10 = 126L, GB10.1 = 134L, GB14 = 181L, GB14.1 = 193L), .Names = c("Nom_xp", "Rep", "GB05", "GB05.1", "GB18", "GB18.1", "GB06", "GB06.1","GB27", "GB27.1", "GB24", "GB24.1", "GB28", "GB28.1", "GB15","GB15.1", "GB02", "GB02.1", "GB10", "GB10.1", "GB14", "GB14.1"), row.names = 32L, class = "data.frame")

sampleB <- structure(list(Nom_xp = "A1MRJ", Rep = 2L, GB05 = NA, GB05.1 = NA, 
                          GB18 = 177L, GB18.1 = 177L, GB06 = 240L, GB06.1 = 240L, GB27 = 169L, 
                          GB27.1 = 169L, GB24 = 240L, GB24.1 = 242L, GB28 = 390L, GB28.1 = 390L, 
                          GB15 = 142L, GB15.1 = 144L, GB02 = 197L, GB02.1 = 197L, GB10 = 126L, 
                          GB10.1 = 134L, GB14 = 181L, GB14.1 = 193L), .Names = c("Nom_xp","Rep", "GB05", "GB05.1", "GB18", "GB18.1", "GB06", "GB06.1", "GB27", "GB27.1", "GB24", "GB24.1", "GB28", "GB28.1", "GB15", "GB15.1", "GB02", "GB02.1", "GB10", "GB10.1", "GB14", "GB14.1"  ), row.names = 33L, class = "data.frame")

sampleC <- structure(list(Nom_xp = "ASDF", Rep = 2L, GB05 = NA, GB05.1 = NA, 
                          GB18 = 177L, GB18.1 = 177L, GB06 = 240L, GB06.1 = 240L, GB27 = 12349L, 
                          GB27.1 = 3, GB24 = 234112, GB24.1 = 242L, GB28 = 234, GB28.1 = 390L, 
                          GB15 = NA, GB15.1 = 144L, GB02 = 197L, GB02.1 = 197L, GB10 = 126L, 
                          GB10.1 = 134L, GB14 = NA, GB14.1 = 193L), .Names = c("Nom_xp", "Rep", "GB05", "GB05.1", "GB18", "GB18.1", "GB06", "GB06.1", "GB27", "GB27.1", "GB24", "GB24.1", "GB28", "GB28.1", "GB15", "GB15.1", "GB02", "GB02.1", "GB10", "GB10.1", "GB14", "GB14.1"), row.names = 34L, class = "data.frame")

sampleD <- structure(list(Nom_xp = "ASDF", Rep = 2L, GB05 = 214, GB05.1 = 34, 
                          GB18 = 177L, GB18.1 = 177L, GB06 = 240L, GB06.1 = 240L, GB27 = 169L, 
                          GB27.1 = 3, GB24 = NA, GB24.1 = 242L, GB28 = 234, GB28.1 = 390L, 
                          GB15 = 56, GB15.1 = 144L, GB02 = 197L, GB02.1 = 197L, GB10 = 15466L, 
                          GB10.1 = 134L, GB14 = 34, GB14.1 = 193L), .Names = c("Nom_xp", "Rep", "GB05", "GB05.1", "GB18", "GB18.1", "GB06", "GB06.1", "GB27", "GB27.1", "GB24", "GB24.1", "GB28", "GB28.1", "GB15", "GB15.1", "GB02", "GB02.1", "GB10", "GB10.1", "GB14", "GB14.1"), row.names = 35L, class = "data.frame")

cdat<-rbind(sampleA,sampleB,sampleC,sampleD) #simulating your data set (?)
dcols<-dim(cdat)[2]

mat<-matrix(nrow=length(unique(cdat$Nom_xp)),ncol=dcols)
colnames(mat)<-colnames(cdat)
for (j in 1:length(unique(cdat$Nom_xp))) 
{
  g<-grep(unique(cdat$Nom_xp)[j],cdat$Nom_xp)   #Get the Nom_xp rows that match
  mat[j,1]<-cdat[g[1],1]                        #Fill in the "Nom_xp"
  mat[j,2]<-paste(g,collapse=" ")               #Fill in the "rep"
  mat[j,3:dcols]<-apply(cdat[g,3:dcols],2,      #Calculate a mean for each column
   function(x){as.numeric(mean(x,na.rm=T))})          
}