将字符串拆分为每个字符一列的数据框

时间:2021-07-28 17:31:09

标签: r

在以下数据集中:

df <- data.frame(barcode=c("B1","B2", "B3"), 
                 sequence= sapply(1:3, function(x) paste(sample(c("A","C","T","G"), 8, replace=T), collapse=""))

我想将 df$sequence 拆分为 8 个附加列,其中包含以正确顺序排列的每个字符串。

我知道如何分割字符向量,但这最终会出现在一个列表中:

library(stringr)
list1 <- str_extract_all(df$sequence,boundary("character"))
[[1]]
[1] "A" "T" "C" "G" "T" "G" "A" "A"

[[2]]
[1] "T" "C" "C" "T" "A" "T" "A" "T"

[[3]]
[1] "C" "G" "T" "T" "A" "A" "G" "G"

str(list1)
List of 3
 $ : chr [1:8] "A" "T" "C" "G" ...
 $ : chr [1:8] "T" "C" "C" "T" ...
 $ : chr [1:8] "C" "G" "T" "T" ...

如何将此列表转换为数据框或有更简单的方法?

编辑:

我可以去:

df$pos1 <- sapply(list1, function(x) x[1])
df$pos2 <- sapply(list1, function(x) x[2])

但我想有更好的解决方案。

2 个答案:

答案 0 :(得分:1)

使用 R 基础:

> data.frame(do.call(rbind, strsplit(df$sequence, "")))
  X1 X2 X3 X4 X5 X6 X7 X8
1  T  A  A  T  C  A  A  A
2  T  T  A  A  A  T  G  G
3  C  G  A  A  T  C  C  T

答案 1 :(得分:1)

我们可以使用正则表达式方法插入一个分隔符,然后用 read.csv 读取

read.csv(text = gsub("(?<=.)(?=.)", ",", df$sequence, perl = TRUE), 
       header = FALSE, colClasses = "character")
  V1 V2 V3 V4 V5 V6 V7 V8
1  A  C  A  A  C  C  C  A
2  G  T  A  G  T  C  C  C
3  C  T  G  G  G  C  G  A