根据正则表达式从两个列表中联接data.frame

时间:2019-07-09 12:46:32

标签: r

这是我的工作区外观:

list.u = list(list.1 = replicate(n = 10,
                     expr = {data.frame(Var1 = as.factor(paste0("X", c(1:10))), 
                                        Var2 = as.factor(paste0("X", c(11:20))), 
                                        value=rnorm(10))},
                     simplify = F),
            list.2 = replicate(n = 10,
                      expr = {data.frame(Var1 = as.factor(paste0("X", c(1:10))), 
                                         Var2 = as.factor(paste0("X", c(11:20))), 
                                         value=rnorm(10))},
                      simplify = F))

list2env(list.u , .GlobalEnv )

names(list.1) <- paste0(LETTERS[1:10],"_NTI")
names(list.2) <- sample(paste0(LETTERS[1:10],"_RC")) # not the same order

###if meaningful can again be possibly converted to 
###list.u <- list(list.1, list.2) 

我要实现的是基于分别在_NTI和_RC之前找到的字符串连接两个对应的data.frames:

library(dplyr)
df.A <- list.1$A_NTI %>% right_join(list.2$A_RC, by=c("Var1","Var2"))
df.B <- list.1$B_NTI %>% right_join(list.2$B_RC, by=c("Var1","Var2"))
df.C <- list.1$C_NTI %>% right_join(list.2$C_RC, by=c("Var1","Var2"))

等,对于list.1和list.2的每对匹配元素

我该怎么做`?

2 个答案:

答案 0 :(得分:3)

您可以首先使用简单的正则表达式来匹配名称,重新排列列表中的数据框,然后一个一个地合并,即

list.1 <- list.1[names(list.1)[match(sub('_.*', '', names(list.1)), sub('_.*', '', names(list.2)))]]
Map(function(i, j)merge(i, j, by = c('Var1', 'Var2'), all.y = TRUE), list.1, list.2)

给出,

$A_NTI
   Var1 Var2      value.x    value.y
1    X1  X11  1.111072143  0.9893348
2   X10  X20  0.205016698 -1.0370611
3    X2  X12 -1.153484350 -0.1581219
4    X3  X13 -0.136188465 -0.8258913
5    X4  X14  0.845438616  1.0676754
6    X5  X15 -0.090040790 -0.6626899
7    X6  X16 -0.003032729  0.4220376
8    X7  X17  0.132374562 -0.5993826
9    X8  X18 -0.049654084  0.1161918
10   X9  X19  0.408352891 -0.4193510

$B_NTI
   Var1 Var2     value.x    value.y
1    X1  X11 -1.54096443  1.6954890
2   X10  X20  0.08418433 -1.1082467
3    X2  X12  0.77535586  0.9035127
4    X3  X13 -1.82040060  0.1870822
5    X4  X14 -1.00129026 -1.6371800
6    X5  X15  0.32455294  0.4544704
7    X6  X16  0.25704291 -0.1451332
8    X7  X17  0.61232730  2.1936744
9    X8  X18  0.43594609 -2.3836932
10   X9  X19 -0.23466536  1.3418739

$C_NTI
   Var1 Var2     value.x     value.y
1    X1  X11 -0.02400835  0.03265689
2   X10  X20 -1.78936480  1.55964999
....

...

注意:merge(..., all.y = TRUE)dplyr::right_join

的基数R。

答案 1 :(得分:2)

stopifnot(length(list.1) == length(list.2))
stopifnot(length(setdiff(substr(names(list.1), 1, 1), substr(names(list.2), 1, 1))) == 0)

似乎在这里它会在合并之前按字母顺序排列每个列表。

Map(merge, list.1[order(names(list.1))], list.2[order(names(list.2))], all.y=TRUE)