合并凌乱的数据帧r

时间:2016-10-28 11:15:40

标签: r merge

我有2个数据框

df1=data.frame(Col1=c('2','4','CN','CANADA',NA),Col2=c('s1','s2','s3','s4','s5'))
> df1
Col1 Col2
1      2   s1
2      4   s2
3     CN   s3
4 CANADA   s4
5   <NA>   s5
df2=data.frame(index=1:5,code=c('AB','CA','US','CN','UK'),name=c('ALBERTA','CANADA','USA','CHINA','UK'),REGION=c('NA','NA','NA','FE','EU'))
> df2
 index code    name REGION
1     1   AB ALBERTA     NA
2     2   CA  CANADA     NA
3     3   US     USA     NA
4     4   CN   CHINA     FE
5     5   UK      UK     EU

我想要

df3=data.frame(df1,code=c('CA','CN','CN','CA',NA),name=c('CANADA','CHINA','CHINA','CANADA',NA),REGION=c('NA','FE','FE','NA',NA))
    Col1 Col2 code   name REGION
1      2   s1   CA CANADA     NA
2      4   s2   CN  CHINA     FE
3     CN   s3   CN  CHINA     FE
4 CANADA   s4   CA CANADA     NA
5   <NA>   s5 <NA>   <NA>   <NA>

我用值来称呼它:

df1$code=df2[df2$index[df1$Col1],2]

填写错误,合并两次

m1=merge(df1,df2,by.x='Col1',by.y='index',all.x=TRUE)
m2=merge(m1,df2,by.x='Col1',by.y='name',all.x=1)

我确信我在这里遗漏了一些东西。谢谢你的帮助

2 个答案:

答案 0 :(得分:1)

也许不是一个非常好的解决方案,但它适用于这个例子:

ind <- sapply(df1$Col1, function(x)which(df2[,c("index", "code", "name")] == as.character(x),arr.ind = T)[1])
cbind(df1, df2[ind,])
      Col1 Col2 index code   name REGION
2        2   s1     2   CA CANADA     NA
4        4   s2     4   CN  CHINA     FE
4.1     CN   s3     4   CN  CHINA     FE
2.1 CANADA   s4     2   CA CANADA     NA
NA    <NA>   s5    NA <NA>   <NA>   <NA>

答案 1 :(得分:-1)

据我所知,df1的Col1包含混合信息。所以我的方法是分离不同的数据类型。那么它应该很容易合并。

chr <- as.character(df1$Col1) 

index_df1 <- chr
index_df1[!grepl("^[0-9]*$", chr)] <- NA
index_df1 <- as.numeric(index_df1)

code_df1 <- chr
code_df1[!grepl("^[A-Z]{2}$", chr)] <- NA

name_df1 <- chr
name_df1[!grepl("^[A-Z]{3,}$", chr)] <- NA

df1 <- data.frame(df1, index_df1, code_df1, name_df1)
相关问题