合并3个数据框左连接

时间:2017-10-06 13:54:29

标签: r

我有3个不等行的数据帧

df1-
T1      T2     T3
1       Joe    TTT
2       PP     YYY
3       JJ     QQQ
5       UU     OOO
6       OO     GGG

df2
X1      X2 
1       09/20/2017
2       08/02/2015
3       05/02/2000
8       06/03/1999

df3
L1       L2
1        New
6        Notsure
9        Also

最终的数据框应该像所有3个仅保留df1行的左连接一样。匹配的行是T1,X1和L1,但具有不同的标题名称。每个数据帧中的行数不同。我无法找到适合这种情况的解决方案。在SO上,我发现可用于2个数据帧或3个具有相同行或相同列名的数据帧

    T1      T2     T3         X2            L2 
    1       Joe    TTT        09/20/2017    New
    2       PP     YYY        08/02/2015    NA
    3       JJ     QQQ        05/02/2000    NA
    5       UU     OOO        NA            NA
    6       OO     GGG        NA            NotSure

我在R中比较新,而且找不到这个

的R代码

4 个答案:

答案 0 :(得分:4)

我们的想法是将您的数据框放在一个列表中,更改第一列的名称,并使用Reduce进行合并,即

Reduce(function(...) merge(..., by = 'Var1', all.x = TRUE), 
    lapply( mget(ls(pattern = 'df[0-9]+')), function(i) {names(i)[1] <- 'Var1'; i}))

给出,

  Var1  T2  T3         X2      L2
1    1 Joe TTT 09/20/2017     New
2    2  PP YYY 08/02/2015     Old
3    3  JJ QQQ 05/02/2000    <NA>
4    5  UU OOO       <NA>    <NA>
5    6  OO GGG       <NA> Notsure

答案 1 :(得分:2)

使用tidyverse函数,您可以尝试:

df1 %>%
  left_join(df2, by = c("T1" = "X1")) %>%
  left_join(df3, by = c("T1" = "L1"))

给出:

  T1  T2  T3         X2      L2
1  1 Joe TTT 09/20/2017     New
2  2  PP YYY 08/02/2015    <NA>
3  3  JJ QQQ 05/02/2000    <NA>
4  5  UU OOO       <NA>    <NA>
5  6  OO GGG       <NA> Notsure

答案 2 :(得分:1)

1)sqldf

library(sqldf)
sqldf("select df1.*, X2, L2 
       from df1 
       left join df2 on T1 = X1 
       left join df3 on T1 = L1")

1a)虽然稍长一点,但这种变化可以让以后在查看代码时更容易,因为它明确了每列的来源。如果数据框名称很长,您可能想要使用别名,例如from df1 as a,但在这里我们不打扰,因为它们很短。

sqldf("select df1.*, df2.X2, df3.L2 
       from df1 
       left join df2 on df1.T1 = df2.X1 
       left join df3 on df1.T1 = df3.L1")

2)合并使用重复合并。没有包使用。

Merge <- function(x, y) merge(x, y, by = 1, all.x = TRUE)
Merge(Merge(df1, df2), df3)

2a)这也可以使用像这样的magrittr管道编写:

library(magrittr)
df1 %>% Merge(df2) %>% Merge(df3)

2b)使用Reduce我们可以像这样重复合并:

Reduce(Merge, list(df1, df2, df3))

注意:可重复形式的输入为:

Lines1 <- "
T1      T2     T3
1       Joe    TTT
2       PP     YYY
3       JJ     QQQ
5       UU     OOO
6       OO     GGG"

Lines2 <- "
X1      X2 
1       09/20/2017
2       08/02/2015
3       05/02/2000
8       06/03/1999"

Lines3 <- "
L1       L2
1        New
6        Notsure
9        Also"

df1 <- read.table(text = Lines1, header = TRUE)
df2 <- read.table(text = Lines2, header = TRUE)
df3 <- read.table(text = Lines3, header = TRUE)

答案 3 :(得分:0)

使用left_join()就像这样

  df1 = data.frame(X = c("a", "b", "c"), var1 = c(1,2, 3))

  df2 = data.frame(V = c("a", "b", "c"), var2 =c(5,NA, NA) )

  df3 = data.frame(Y = c("a", "b", "c"), var3 =c("name", NA, "age") )

# rename   
df2 = df2 %>% rename(X = V)
df3 = df3 %>% rename(X = Y)

df = left_join(df1, df2, by = "X") %>% 
    left_join(., df3, by = "X")

> df
  X var1 var2 var3
1 a    1    5 name
2 b    2   NA <NA>
3 c    3   NA  age