New column in dataframe based on match between two columns

时间:2017-06-19 13:58:47

标签: r dataframe match

I am trying to create a new column in an existing dataframe (df1) by comparing the values of column x to the values of column y in a different dataframe (df2).

The result should look like df_end. Where there is a match, the value of column x should be returned. Where there is no match, NA should be returned.

df1 <- data.frame(x = c("blue2", "blue6", "green9", "green7"))
df2 <- data.frame(y = c("blue2", "green9"))

df_end <- data.frame(x = c("blue2", "blue6", "green9", "green7"),
                     match = c("blue2", NA, "green9", NA))

I have experimented with merge, match and if,else statements, but I can't figure it out. Someone has a piece of advice for me?

#Attempt 1: Merge
df1$match <- merge(df1, df2, by.x = x, all = TRUE)

This does not work, because df1 and df2 are of different length.

1 个答案:

答案 0 :(得分:1)

我做了以下事情:

df1 <- data.frame(x = c("blue2", "blue6", "green9", "green7"))
df2 <- data.frame(y = c("blue2", "green9"))

end <- sapply(df1$x, function(x) { # for each value in df1$x
  j <- which(df2$y == x) # check if df2$y has a match
  ifelse(length(j) > 0, j, NA) # if there is, give the location in the vector
}) # if not give NA

cbind(df1,df2, match = df2$y[end]) # subset the df2 with the location to get the characters

#       x      y  match
#1  blue2  blue2  blue2
#2  blue6 green9   <NA>
#3 green9  blue2 green9
#4 green7 green9   <NA>

修改 见sotos&#39;评论最佳答案:df2$y[match(df1$x, df2$y)]