使用 NA 值将两行折叠为一行

时间:2021-07-01 14:13:04

标签: r dplyr purrr

我想折叠数据框中的前两行。它们具有 NA 值。我正在处理许多相同结构的数据帧,所以我希望有一个 dplyr 解决方案,我可以将它放入 purrr。

最好的方法是什么?

# A tibble: 3 x 12
     id `Title:`  `COVID-19 Vaccinat… ...3  ...4  ...5  ...6  ...7   ...8  ...9 
  <int> <chr>     <chr>               <chr> <chr> <chr> <chr> <chr>  <chr> <chr>
1     1 Region C… Region Name (admin… LTLA… LTLA… MSOA… MSOA… Numbe… NA    NA   
2     2 NA        NA                  NA    NA    NA    NA    Under… 65-69 70-74
3     3 E12000004 East Midlands       E060… Derby E020… Alle… 913    390   427  
# … with 2 more variables: ...10 <chr>, ...11 <chr>

可重现:

df <- structure(list(id = 1:3, `Title:` = c("Region Code (Administrative)", 
NA, "E12000004"), `COVID-19 Vaccinations By Middle Layer Super Output Area (MSOA) of Residence and Age Group` = c("Region Name (administrative)", 
NA, "East Midlands"), ...3 = c("LTLA Code", NA, "E06000015"), 
    ...4 = c("LTLA Name", NA, "Derby"), ...5 = c("MSOA Code", 
    NA, "E02002796"), ...6 = c("MSOA Name", NA, "Allestree North"
    ), ...7 = c("Number of people vaccinated with at least 1 dose", 
    "Under 65", "913"), ...8 = c(NA, "65-69", "390"), ...9 = c(NA, 
    "70-74", "427"), ...10 = c(NA, "75-79", "352"), ...11 = c(NA, 
    "80+", "456")), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame"))

2 个答案:

答案 0 :(得分:1)

这行得通吗:

df1 <- df
mylist <- list(df1,df)
mylist
[[1]]
# A tibble: 3 x 12
     id `Title:`             `COVID-19 Vaccinations By Middle Layer Super Output Area (MS~ ...3     ...4    ...5    ...6       ...7                           ...8  ...9  ...10 ...11
  <int> <chr>                <chr>                                                         <chr>    <chr>   <chr>   <chr>      <chr>                          <chr> <chr> <chr> <chr>
1     1 Region Code (Admini~ Region Name (administrative)                                  LTLA Co~ LTLA N~ MSOA C~ MSOA Name  Number of people vaccinated w~ NA    NA    NA    NA   
2     2 NA                   NA                                                            NA       NA      NA      NA         Under 65                       65-69 70-74 75-79 80+  
3     3 E12000004            East Midlands                                                 E060000~ Derby   E02002~ Allestree~ 913                            390   427   352   456  

[[2]]
# A tibble: 3 x 12
     id `Title:`             `COVID-19 Vaccinations By Middle Layer Super Output Area (MS~ ...3     ...4    ...5    ...6       ...7                           ...8  ...9  ...10 ...11
  <int> <chr>                <chr>                                                         <chr>    <chr>   <chr>   <chr>      <chr>                          <chr> <chr> <chr> <chr>
1     1 Region Code (Admini~ Region Name (administrative)                                  LTLA Co~ LTLA N~ MSOA C~ MSOA Name  Number of people vaccinated w~ NA    NA    NA    NA   
2     2 NA                   NA                                                            NA       NA      NA      NA         Under 65                       65-69 70-74 75-79 80+  
3     3 E12000004            East Midlands                                                 E060000~ Derby   E02002~ Allestree~ 913                            390   427   352   456  

map(mylist, function(x) as.data.frame(rbind(coalesce(x[1,], x[2,]), x[3,])))
[[1]]
  id                       Title: COVID-19 Vaccinations By Middle Layer Super Output Area (MSOA) of Residence and Age Group      ...3      ...4      ...5            ...6
1  1 Region Code (Administrative)                                                              Region Name (administrative) LTLA Code LTLA Name MSOA Code       MSOA Name
2  3                    E12000004                                                                             East Midlands E06000015     Derby E02002796 Allestree North
                                              ...7  ...8  ...9 ...10 ...11
1 Number of people vaccinated with at least 1 dose 65-69 70-74 75-79   80+
2                                              913   390   427   352   456

[[2]]
  id                       Title: COVID-19 Vaccinations By Middle Layer Super Output Area (MSOA) of Residence and Age Group      ...3      ...4      ...5            ...6
1  1 Region Code (Administrative)                                                              Region Name (administrative) LTLA Code LTLA Name MSOA Code       MSOA Name
2  3                    E12000004                                                                             East Midlands E06000015     Derby E02002796 Allestree North
                                              ...7  ...8  ...9 ...10 ...11
1 Number of people vaccinated with at least 1 dose 65-69 70-74 75-79   80+
2                                              913   390   427   352   456

答案 1 :(得分:1)

您可以为前 2 行选择每列中的第一个非 NA 值,并将数据与数据框中的其余行绑定。

library(dplyr)

df %>%
  slice(1:2) %>%
  summarise(across(.fns = ~na.omit(.x)[1])) %>%
  bind_rows(df %>% slice(-(1:2)))

#     id `Title:`    `COVID-19 Vaccinations By… ...4   ...5   ...6  ...7   ...8       ...9  ...10 ...11 ...12
#  <int> <chr>       <chr>                      <chr>  <chr>  <chr> <chr>  <chr>      <chr> <chr> <chr> <chr>
#1     1 Region Cod… Region Name (administrati… LTLA … LTLA … MSOA… MSOA … Number of… 65-69 70-74 75-79 80+  
#2     3 E12000004   East Midlands              E0600… Derby  E020… Alles… 913        390   427   352   456  
相关问题