如何合并这些数据框

时间:2019-03-19 12:19:52

标签: r merge dplyr

我有两个df,我需要将它们合并。

df1看起来像这样:

COUNTRY   YEAR   TRADE   
Spain     2016   276   
Germany   2016   323      
France    2016   392
Spain     2017   456   
Germany   2017   564      
France    2017   359
Spain     2015   767   
Germany   2015   868      
France    2015   969

df2看起来像这样:

COUNTRY   GDP2016   GDP2017 GDP2015
Spain      1111       999    444
Germany    2222       888    555  
France     3333       777    666

我可以使用两个GDP:

df3 <- merge(df1,df2, by = "COUNTRY")

df3 <- df3 %>% mutate(GDP = ifelse(YEAR == 2016, GDP2016, GDP2017))
df3 <- subset(df3, select = -c(GDP2016, GDP2017)

但是,在GDP为3的情况下,我必须使用其他方法。我想得到的是:

COUNTRY   YEAR   TRADE    GDP 
Spain     2016   276      1111
Germany   2016   323      2222   
France    2016   392      3333
Spain     2017   456      999
Germany   2017   564      888      
France    2017   359      777
Spain     2015   767      444
Germany   2015   868      555      
France    2015   969      666

我将不胜感激!

4 个答案:

答案 0 :(得分:0)

您必须melt df2才能将其放入与df1相同的格式。然后,通过删除字符串的“ GDP”部分并仅保留年份,用gsub创建一个新列YEAR。

df2_melt <- melt(df2, id.vars="COUNTRY")
df2_melt$YEAR <- gsub(pattern = "GDP",replacement = "",x = df2_melt$variable)
colnames(df2_melt)[colnames(df2_melt)=="value"] <- "GDP"

df3 <- merge(df1,df2_melt, by = c("COUNTRY","YEAR"))

  COUNTRY YEAR TRADE variable  GDP
1  France 2016   392  GDP2016 3333
2  France 2017   359  GDP2017  777
3 Germany 2016   323  GDP2016 2222
4 Germany 2017   564  GDP2017  888
5   Spain 2016   276  GDP2016 1111
6   Spain 2017   456  GDP2017  999

数据

df1 <- read.table(text="COUNTRY   YEAR   TRADE   
Spain     2016   276   
Germany   2016   323      
France    2016   392
Spain     2017   456   
Germany   2017   564      
France    2017   359
Spain     2015   767   
Germany   2015   868      
France    2015   969",header=TRUE, stringsAsFactors=FALSE)

df2 <- read.table(text="COUNTRY   GDP2016   GDP2017 GDP2018
Spain      1111       999    444
Germany    2222       888    555  
France     3333       777    6669",header=TRUE, stringsAsFactors=FALSE)

答案 1 :(得分:0)

您可以这样做:

library(tidyverse)

df1 %>%
  left_join(df2 %>%
              gather(YEAR, GDP, -COUNTRY) %>%
              mutate(YEAR = as.integer(sub("GDP", "", YEAR))),
            by = c("COUNTRY", "YEAR"))

答案 2 :(得分:0)

问题在于df2不在易于连接的结构中,因此我将使用tidyr更改结构:

library(dplyr)
library(tidyr)

df3 <-
  df1 %>% 
  left_join(df2 %>% 
               gather(YEAR, GDP, -COUNTRY) %>% 
               mutate(YEAR = as.numeric(substr(YEAR, 4, 7))), 
             by = c("COUNTRY", "YEAR"))

请注意,由于年份不同,因此无法提供预期的答案。在df1中有2015年,但是在df2中有GDB2018的数据。

使用的数据:

df1 <- tibble::tribble(
   ~COUNTRY, ~YEAR, ~TRADE,
    "Spain",  2016,    276,
  "Germany",  2016,    323,
   "France",  2016,    392,
    "Spain",  2017,    456,
  "Germany",  2017,    564,
   "France",  2017,    359,
    "Spain",  2015,    767,
  "Germany",  2015,    868,
   "France",  2015,    969
  )

df2 <- tibble::tribble(
   ~COUNTRY, ~GDP2016, ~GDP2017, ~GDP2018,
    "Spain",     1111,      999,      444,
  "Germany",     2222,      888,      555,
   "France",     3333,      777,      666
  )

答案 3 :(得分:0)

data.table

样本数据

library( data.table )
df1 <- fread("COUNTRY   YEAR   TRADE   
Spain     2016   276   
             Germany   2016   323      
             France    2016   392
             Spain     2017   456   
             Germany   2017   564      
             France    2017   359
             Spain     2015   767   
             Germany   2015   868      
             France    2015   969")

df2 <- fread("COUNTRY   GDP2016   GDP2017 GDP2015
Spain      1111       999    444
             Germany    2222       888    555  
             France     3333       777    666")

代码

#first melt and modify df2
df3 <- melt(df2, id.vars = "COUNTRY", variable.name = "YEAR")[, YEAR := as.numeric(gsub("[^0-9]", "", YEAR))]
#then join
df1[ df3, GDP := i.value, on = .(COUNTRY, YEAR) ][]

#or use as oneliner
df1[ melt(df2, id.vars = "COUNTRY", variable.name = "YEAR")[, YEAR := as.numeric(gsub("[^0-9]", "", YEAR))], GDP := i.value, on = .(COUNTRY, YEAR) ][]

输出

#    COUNTRY YEAR TRADE  GDP
# 1:   Spain 2016   276 1111
# 2: Germany 2016   323 2222
# 3:  France 2016   392 3333
# 4:   Spain 2017   456  999
# 5: Germany 2017   564  888
# 6:  France 2017   359  777
# 7:   Spain 2015   767  444
# 8: Germany 2015   868  555
# 9:  France 2015   969  666