从字符串中提取数字并复制其关联的值

时间:2015-11-14 15:10:29

标签: r

我目前有这个代码:

args <- commandArgs(TRUE)
args[1] <- "H2SO4"

components <- gsub('([[:upper:]])', ' \\1', args[1])
components <- c(unlist(strsplit(components, " ")))[-1]

用于输入H2SO4导致载体:

[1] "H2" "S"  "O4"

现在我如何将字母与字母分开,并将该字母的数量与提取的数字相对应。 所以输出看起来像:

[1] "H" "H" "S" "O" "O" "O" "O"

4 个答案:

答案 0 :(得分:4)

我们可以使用public class custom{ int value; String text; public Boolean equals(custom obj){ if(this.value==obj.value && this.text.equals(obj.text)){ return true; }else{return false;} } } 按字符串的数字部分复制字符,然后使用gsubfn提取字符。

str_extract_all

数据

library(gsubfn)
library(stringr)
str_extract_all(gsubfn('(\\D)(\\d+)', ~rep(x,y), str1),'[A-Z]')[[1]]
#[1] "H" "H" "S" "O" "O" "O" "O"

str_extract_all(gsubfn('(\\D)(\\d+)', ~rep(x,y), str2),'[A-Z]')[[1]]
#[1] "C" "C" "C" "C" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "O"

str_extract_all(gsubfn('([A-Z][a-z]*)(\\d+)', ~rep(x,y), 
            str3), '[A-Z][a-z]*')[[1]]
#[1] "Fe" "Fe" "Fe"

str_extract_all(gsubfn('([A-Z][a-z]*)(\\d+)', ~rep(x,y), 
            str4), '[A-Z][a-z]*')[[1]]
#[1] "Fe" "Fe" "O"  "O"  "O" 

答案 1 :(得分:4)

这正是函数inverse.rle所做的,你只需要输入正确的格式:

repetitions = as.numeric(gsub('[[:upper:]]', '', components))
repetitions[is.na(repetitions)] = 1

rle = list(lengths = repetitions, values = gsub('[^[:upper:]]', '', components))
inverse.rle(rle)

当然,你也可以重新发明inverse.rle的功能,这并不难,正如另一个答案所示。但是,编写现有工具而不是重新构建它们通常是一个好主意(要明确:我不推荐我对akrun的回答,更简洁,更直接,更有效;但是,要注意这一点很好随时可用的工具。)

答案 2 :(得分:1)

另一次尝试

args <- "H2SO10"

components <- gsub('([[:upper:]])', ' \\1', args)
components <- c(unlist(strsplit(components, " ")))[-1]

f <- function(x)
  if (length(y <- strsplit(x, '(?=\\D\\d+)', perl = TRUE)[[1]]) > 1)
    rep(y[1], as.numeric(y[2])) else x

f(components[1])
# [1] "H" "H"

unlist(Vectorize(f, USE.NAMES = FALSE)(components))
# [1] "H" "H" "S" "O" "O" "O" "O" "O" "O" "O" "O" "O" "O"

答案 3 :(得分:0)

这是一种dplyr方式:

library(stringi)
library(tidyr)
library(plyr)
library(dplyr)

chemicals = data_frame(chemical = c("H2SO4", "C4H10O"))

elements = 
  chemicals %>%
  mutate(element_number = 
           chemical %>%
           stri_replace_all_regex("([A-Z])", 
                                  " $1") %>%
           stri_split_fixed(" ") ) %>%
  unnest(element_number) %>%
  filter(element_number != "") %>%
  mutate(element = 
           element_number %>%
           stri_replace_all_regex("[0-9]", ""),
         number = 
           element_number %>%
           stri_replace_all_regex("[^0-9]", "") %>%
           as.numeric %>%
           mapvalues(NA, 1)) %>%
  select(-element_number)

long_elements = 
  elements %>%
  rowwise %>%
  mutate(result = 
           element %>%
           rep(number) %>%
           list) %>%
  unnest(result)
相关问题