Question

我有一个大数据框，我想用SQL查询到许多数据库的结果填写，所以可以说是“填充数据小窝”。皱纹：我不知道会填充多少个小窝（有一年一组，所以我可能得到一年或一年的数据框）。

我很难弄清楚如何实现这一目标。我正在尝试使用dplyr包..

left_join要么两次添加相同的行（如果我指定by=），要么删除新列（如果我没有指定by=，那么它会加入两个相似的行列）
bind_cols不起作用
bind_rows会添加一个重复的行。

如何获取新数据以填充小孔？（顺便说一下，我没有和dplyr结婚......我只是不想遍历新数据帧的每个元素）

代码如下：

library(dplyr)
TargetDF <- structure(list(Ind = c(5, 6, 7), `2015 Act` = c(7870L, NA, NA
                                                            )), .Names = c("Ind", "2015 Act"), class = c("tbl_df", "data.frame"
                                                                                                         ), row.names = c(NA, -3L))

tempDF <- structure(list(Ind = 6, `2015 Act` = 49782L, `2016 Act` = 323L), .Names = c("Ind", 
                                                                                      "2015 Act", "2016 Act"), class = c("tbl_df", "tbl", "data.frame"
                                                                                      ), row.names = c(NA, -1L))
left_join(TargetDF,tempDF, by= "Ind")
## gives duplicate columns

left_join(TargetDF,tempDF)
## loses the new "2015 Act" data for Ind 6

bind_cols(TargetDF,tempDF)
## don't work

bind_rows(TargetDF,tempDF)
## double Ind 6 (there are other columns nor included here, which is why I can't !is.na() to eliminate duplicate Ind 6)

Answer 1

一种可能的方法是从按NA分组的每列中获取非Ind值，否则，保留（生成）NA

full_join(TargetDF, tempDF) %>% 
  group_by(Ind) %>% 
  summarise_each(funs(.[!is.na(.)][1L]))

# Source: local data frame [3 x 3]
# 
#     Ind 2015 Act 2016 Act
#   (dbl)    (int)    (int)
# 1     5     7870       NA
# 2     6    49782      323
# 3     7       NA       NA

Answer 2

您可以使用我的软件包safejoin，进行左联接并使用dplyr::coalesce处理冲突：

# devtools::install_github("moodymudskipper/safejoin")
library(safejoin)
library(dplyr)
safe_left_join(TargetDF, tempDF, by = "Ind", conflict = coalesce)
# # tibble [3 x 3]
#     Ind `2015 Act` `2016 Act`
#   <dbl>      <int>      <int>
# 1     5       7870         NA
# 2     6      49782        323
# 3     7         NA         NA

合并两个具有公共行和列的数据帧（填写）

2 个答案: