合并数据框并不是所有变量的共同点

时间:2014-06-11 08:39:44

标签: r merge dataframe

我想合并两个没有共同变量的大数据框架。我已尝试过' merge'但我不能得到我想要的东西。

一个例子:

# Data frame to merge 1
ID <- c("1", "2", "3", "4", "5")
Colour <- c("Red", "Red", "Red", NA, NA)
Flavour <- c("Sweet", "Sweet", "Sweet", NA, NA)
Price <- c(5, 10, 15, 20, 25)
df1 <- data.frame(ID, Colour, Flavour, Price)
rm(ID, Colour, Flavour, Price)

# Data frame to merge 2
ID <- c("4", "5")
Colour <- c("Green", "Green")
Flavour <- c("Bitter", "Bitter")
df2 <- data.frame(ID, Colour, Flavour)
rm(ID, Colour, Flavour)

# What I'd like to get
ID <- c("1", "2", "3", "4", "5")
Colour <- c("Red", "Red", "Red", "Green", "Green")
Flavour <- c("Sweet", "Sweet", "Sweet", "Bitter", "Bitter")
Price <- c(5, 10, 15, 20, 25)
RESULT <- data.frame(ID, Colour, Flavour, Price)
rm(ID, Colour, Flavour, Price)

任何帮助都非常感谢!!

3 个答案:

答案 0 :(得分:1)

如果您在上面描述的内容正是您所需要的,那么您可能根本不需要合并。这有用吗:

# Data frame to merge 1
df1 <- data.frame(ID=c("1", "2", "3", "4", "5"),
                  Colour=c("Red", "Red", "Red", NA, NA),
                  Flavour=c("Sweet", "Sweet", "Sweet", NA, NA),
                  Price=c(5, 10, 15, 20, 25),
                  stringsAsFactors=FALSE)

df2<- data.frame(ID2=c("4", "5"),
                  Colour2=c("Green", "Green"),
                  Flavour2=c("Bitter", "Bitter"),
                  stringsAsFactors=FALSE)

# Assuming the two dfs are ordered on ID. If not, do so.
df1[df1[["ID"]] %in% df2[["ID2"]],
    c("Colour", "Flavour")] = df2[c("Colour2", "Flavour2")]

这个想法是简单地将df2中的值替换为df1,无论它们在何处需要。

答案 1 :(得分:1)

我会这样做: 安装gtools

library(gtools)
df_new <- smartbind(df1,df2)

您将获得七行,即df1df2的组合。删除不必要的行并替换na,我使用这个技巧:

df_new <- df_new[-1] #remove the ID column

df_new[4:5,][is.na(df_new[4:5,])] <- df_new[6:7,][!is.na(df_new[6:7,])]

df_new <- df_new[complete.cases(df_new),]

df_new$ID <- c(1:nrow(df_new)) #add ID column back 

答案 2 :(得分:0)

不幸的是merge并没有很好地使用该结构(它会为NA添加行)。

我已经撤回了我的重复投票,因为这个问题实际上有点不同。

我们可以使用@joran here提供的大部分方法,但需要更改一个小细节。由于您的data.frames具有不同的列集,因此您需要使用rbind.fill而不是rbind

library(plyr)
ab <- rbind.fill(df1, df2)
colFun <- function(x){x[which(!is.na(x))]}
ddply(ab, .(ID), function(x){ colwise(colFun)(x) })

  ID Colour Flavour Price
1  1    Red   Sweet     5
2  2    Red   Sweet    10
3  3    Red   Sweet    15
4  4  Green  Bitter    20
5  5  Green  Bitter    25