在以下数据集中:
df <- data.frame(barcode=c("B1","B2", "B3"),
sequence= sapply(1:3, function(x) paste(sample(c("A","C","T","G"), 8, replace=T), collapse=""))
我想将 df$sequence
拆分为 8 个附加列,其中包含以正确顺序排列的每个字符串。
我知道如何分割字符向量,但这最终会出现在一个列表中:
library(stringr)
list1 <- str_extract_all(df$sequence,boundary("character"))
[[1]]
[1] "A" "T" "C" "G" "T" "G" "A" "A"
[[2]]
[1] "T" "C" "C" "T" "A" "T" "A" "T"
[[3]]
[1] "C" "G" "T" "T" "A" "A" "G" "G"
str(list1)
List of 3
$ : chr [1:8] "A" "T" "C" "G" ...
$ : chr [1:8] "T" "C" "C" "T" ...
$ : chr [1:8] "C" "G" "T" "T" ...
如何将此列表转换为数据框或有更简单的方法?
编辑:
我可以去:
df$pos1 <- sapply(list1, function(x) x[1])
df$pos2 <- sapply(list1, function(x) x[2])
但我想有更好的解决方案。
答案 0 :(得分:1)
使用 R 基础:
> data.frame(do.call(rbind, strsplit(df$sequence, "")))
X1 X2 X3 X4 X5 X6 X7 X8
1 T A A T C A A A
2 T T A A A T G G
3 C G A A T C C T
答案 1 :(得分:1)
我们可以使用正则表达式方法插入一个分隔符,然后用 read.csv
读取
read.csv(text = gsub("(?<=.)(?=.)", ",", df$sequence, perl = TRUE),
header = FALSE, colClasses = "character")
V1 V2 V3 V4 V5 V6 V7 V8
1 A C A A C C C A
2 G T A G T C C C
3 C T G G G C G A