根据索引进行字符串分割

时间:2019-05-08 13:18:05

标签: r

我有一个DF,该DF包含一列具有字母数字值的列。我想将这些值拆分并将其存储在单独的列中。

我有一个数据框,其中有一列带有字母数字值。我想拆分该值并将其存储到新列,如下面的示例所示。

str <-c(“ 1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077”,        “ 1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077”,        “ 1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077”,        “ 1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077”,        “ 1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077”,        “ 1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077”)

输出:


AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077

2 个答案:

答案 0 :(得分:2)

使用一行样本输出查找字段宽度。它以4开头,因为样本输出似乎缺少输入的前4个字符。然后在read.fwf中使用它。如果您确实不希望输入的前4个字符出现在输出中,则将read.fwf行替换为read.fwf(textConnection(str), widths)[-1]。不使用任何软件包。

sample.out <- "AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077"
widths <- c(4, sapply(read.table(text = sample.out, as.is = TRUE), nchar))

read.fwf(textConnection(str), widths)

给予:

    V1   V2       V3   V4     V5   V6 V7                                      V8
1 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
2 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
3 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
4 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
5 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
6 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077

答案 1 :(得分:0)

一种选择是使用separate中的tidyverse

library(tidyverse)
tibble(col1 = str) %>%
   separate(col1, into = paste0("col", 0:7), c(4, 8, 16, 20, 26, 30, 32)) %>% 
   select(-1)
# A tibble: 6 x 7
#  col1  col2     col3  col4   col5  col6  col7                                   
#  <chr> <chr>    <chr> <chr>  <chr> <chr> <chr>                                  
#1 AA00  100BC300 AA01  111000 AA02  99    F40400F4053DF40C0000F4030000F40680F4077
#2 AA00  100BC300 AA01  111000 AA02  99    F40400F4053DF40C0000F4030000F40680F4077
#3 AA00  100BC300 AA01  111000 AA02  99    F40400F4053DF40C0000F4030000F40680F4077
#4 AA00  100BC300 AA01  111000 AA02  99    F40400F4053DF40C0000F4030000F40680F4077
#5 AA00  100BC300 AA01  111000 AA02  99    F40400F4053DF40C0000F4030000F40680F4077
#6 AA00  100BC300 AA01  111000 AA02  99    F40400F4053DF40C0000F4030000F40680F4077

或者另一个选择是不包含任何带有base R的程序包,方法是根据位置创建一个分隔符,然后使用read.csv读取

read.csv(text = sub("^.{4}(.{4})(.{8})(.{4})(.{6})(.{4})(.{2})(.*)", 
    "\\1,\\2,\\3,\\4,\\5,\\6,\\7", str), header = FALSE, 
        stringsAsFactors = FALSE)
#   V1       V2   V3     V4   V5 V6                                      V7
#1 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#2 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#3 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#4 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#5 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#6 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077