我想折叠数据框中的前两行。它们具有 NA 值。我正在处理许多相同结构的数据帧,所以我希望有一个 dplyr 解决方案,我可以将它放入 purrr。
最好的方法是什么?
# A tibble: 3 x 12
id `Title:` `COVID-19 Vaccinat… ...3 ...4 ...5 ...6 ...7 ...8 ...9
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 Region C… Region Name (admin… LTLA… LTLA… MSOA… MSOA… Numbe… NA NA
2 2 NA NA NA NA NA NA Under… 65-69 70-74
3 3 E12000004 East Midlands E060… Derby E020… Alle… 913 390 427
# … with 2 more variables: ...10 <chr>, ...11 <chr>
可重现:
df <- structure(list(id = 1:3, `Title:` = c("Region Code (Administrative)",
NA, "E12000004"), `COVID-19 Vaccinations By Middle Layer Super Output Area (MSOA) of Residence and Age Group` = c("Region Name (administrative)",
NA, "East Midlands"), ...3 = c("LTLA Code", NA, "E06000015"),
...4 = c("LTLA Name", NA, "Derby"), ...5 = c("MSOA Code",
NA, "E02002796"), ...6 = c("MSOA Name", NA, "Allestree North"
), ...7 = c("Number of people vaccinated with at least 1 dose",
"Under 65", "913"), ...8 = c(NA, "65-69", "390"), ...9 = c(NA,
"70-74", "427"), ...10 = c(NA, "75-79", "352"), ...11 = c(NA,
"80+", "456")), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"))
答案 0 :(得分:1)
这行得通吗:
df1 <- df
mylist <- list(df1,df)
mylist
[[1]]
# A tibble: 3 x 12
id `Title:` `COVID-19 Vaccinations By Middle Layer Super Output Area (MS~ ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 Region Code (Admini~ Region Name (administrative) LTLA Co~ LTLA N~ MSOA C~ MSOA Name Number of people vaccinated w~ NA NA NA NA
2 2 NA NA NA NA NA NA Under 65 65-69 70-74 75-79 80+
3 3 E12000004 East Midlands E060000~ Derby E02002~ Allestree~ 913 390 427 352 456
[[2]]
# A tibble: 3 x 12
id `Title:` `COVID-19 Vaccinations By Middle Layer Super Output Area (MS~ ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 Region Code (Admini~ Region Name (administrative) LTLA Co~ LTLA N~ MSOA C~ MSOA Name Number of people vaccinated w~ NA NA NA NA
2 2 NA NA NA NA NA NA Under 65 65-69 70-74 75-79 80+
3 3 E12000004 East Midlands E060000~ Derby E02002~ Allestree~ 913 390 427 352 456
map(mylist, function(x) as.data.frame(rbind(coalesce(x[1,], x[2,]), x[3,])))
[[1]]
id Title: COVID-19 Vaccinations By Middle Layer Super Output Area (MSOA) of Residence and Age Group ...3 ...4 ...5 ...6
1 1 Region Code (Administrative) Region Name (administrative) LTLA Code LTLA Name MSOA Code MSOA Name
2 3 E12000004 East Midlands E06000015 Derby E02002796 Allestree North
...7 ...8 ...9 ...10 ...11
1 Number of people vaccinated with at least 1 dose 65-69 70-74 75-79 80+
2 913 390 427 352 456
[[2]]
id Title: COVID-19 Vaccinations By Middle Layer Super Output Area (MSOA) of Residence and Age Group ...3 ...4 ...5 ...6
1 1 Region Code (Administrative) Region Name (administrative) LTLA Code LTLA Name MSOA Code MSOA Name
2 3 E12000004 East Midlands E06000015 Derby E02002796 Allestree North
...7 ...8 ...9 ...10 ...11
1 Number of people vaccinated with at least 1 dose 65-69 70-74 75-79 80+
2 913 390 427 352 456
答案 1 :(得分:1)
您可以为前 2 行选择每列中的第一个非 NA 值,并将数据与数据框中的其余行绑定。
library(dplyr)
df %>%
slice(1:2) %>%
summarise(across(.fns = ~na.omit(.x)[1])) %>%
bind_rows(df %>% slice(-(1:2)))
# id `Title:` `COVID-19 Vaccinations By… ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12
# <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 1 Region Cod… Region Name (administrati… LTLA … LTLA … MSOA… MSOA … Number of… 65-69 70-74 75-79 80+
#2 3 E12000004 East Midlands E0600… Derby E020… Alles… 913 390 427 352 456