我导入的数据看起来像这样;
ID col1 col2 col3 col4
1 a e i r
j s
k t
2 b f l u
m v
n w
o x
3 c g p y
4 d h q z
并希望它被转换,以便每行有一个唯一的ID,IE:
ID col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
1 a e i r j s k t
2 b f l u m v n w o x
3 c g p y
4 d h q z
易于消化的数据:
df <- data.frame(ID = c(1, NA, NA, 2, NA, NA, NA, 3, 4),
col1 = c('a', NA, NA, 'b', NA, NA, NA, 'c', 'd'),
col2 = c('e', NA, NA, 'f', NA, NA, NA, 'g', 'h'),
col3 = letters[9:17],
col4 = letters[18:26])
答案 0 :(得分:3)
需要注意的是,对于这样的情况,长形式几乎总是更有用,有两种选择:
color.setRGB( Math.random(), Math.random(), Math.random() );
或将所有内容折叠为字符串并分开:
library(tidyverse)
df <- data.frame(ID = c(1, NA, NA, 2, NA, NA, NA, 3, 4),
col1 = c('a', NA, NA, 'b', NA, NA, NA, 'c', 'd'),
col2 = c('e', NA, NA, 'f', NA, NA, NA, 'g', 'h'),
col3 = letters[9:17],
col4 = letters[18:26])
df %>% fill(ID) %>%
gather(var, val, -ID) %>%
drop_na(val) %>%
group_by(ID) %>%
mutate(var = sprintf('col%02d', row_number())) %>%
spread(var, val)
#> # A tibble: 4 × 11
#> # Groups: ID [4]
#> ID col01 col02 col03 col04 col05 col06 col07 col08 col09 col10
#> * <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 a e i j k r s t <NA> <NA>
#> 2 2 b f l m n o u v w x
#> 3 3 c g p y <NA> <NA> <NA> <NA> <NA> <NA>
#> 4 4 d h q z <NA> <NA> <NA> <NA> <NA> <NA>
答案 1 :(得分:0)
tidyverse
解决方案:
df %>%
mutate(ID = zoo::na.locf(ID)) %>%
mutate(row = row_number()) %>%
tidyr::gather(col, val, col1:col4) %>%
filter(!is.na(val)) %>%
arrange(ID, row, col) %>%
select(-row) %>%
group_by(ID) %>%
mutate(col = row_number()) %>%
mutate(col = paste0('col', stringr::str_pad(col, side = 'left', pad = '0', width = 2))) %>%
tidyr::spread(col, val)
答案 2 :(得分:0)
以下是使用dplyr
和tidyr
的组合以及一些基础的解决方案:
library(dplyr)
library(tidyr)
df <- fill(df, ID, .direction = 'down')
numCols <- max(sapply(split(df, df$ID), function(x) sum(!is.na(x[, -1]))))
df %>%
group_by(ID) %>%
do(summarise(., l = paste(unlist(.[, -1])[!is.na(unlist(.[, -1]))], collapse = ' '))) %>%
separate(l, into = paste0('col', 1:numCols), sep = ' ')
输出如下:
ID col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
* <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 a e i j k r s t <NA> <NA>
2 2 b f l m n o u v w x
3 3 c g p y <NA> <NA> <NA> <NA> <NA> <NA>
4 4 d h q z <NA> <NA> <NA> <NA> <NA> <NA>
答案 3 :(得分:0)
基本上有时候一半都不好:
tmp <- na.omit(data.frame(id=cummax(replace(df$ID, is.na(df$ID), 0)), col=unlist(df[-1]) ))
reshape(transform(tmp, time=ave(id,id,FUN=seq_along)), direction="wide", idvar="id", sep="")
# id col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
#col11 1 a e i j k r s t <NA> <NA>
#col14 2 b f l m n o u v w x
#col18 3 c g p y <NA> <NA> <NA> <NA> <NA> <NA>
#col19 4 d h q z <NA> <NA> <NA> <NA> <NA> <NA>