Question

我有一个类似下面的字符串

x <- "Supplier will initially respond to High Priority incidents.  Supplier will subsequently update EY every 60 minutes or at an interval EY specifies. Reporting and Response times will be capture in ServiceNow which, save in respect of manifest error, will be conclusive proof of the time period taken."

我想在＆＃34;每个＆＃34;。

之后提取2个单词

如何在R中实现这一目标？

Answer 1

我们可以使用str_extract使用正则表达式解决方法（(?<=every\\s)）后跟两个单词

library(stringr) #corrected the package here
unlist(str_extract_all(x, "(?<=every\\s)(\\w+\\s+\\w+)"))
#[1] "60 minutes"

或使用base R

regmatches(x, gregexpr("(?<=every\\s)(\\w+\\s+\\w+)", x, perl = TRUE))[[1]]
#[1] "60 minutes"

Answer 2

基础R中的这种情况，

拆分字符串的每个单词，然后找到单词every的出现索引，然后从该索引中选择接下来的两个单词。

wordsplit <- unlist(strsplit(x, " ", fixed = TRUE))
indx <- grep("\\bevery\\b", wordsplit)
wordsplit[(indx+1):(indx +2)]
#[1] "60"      "minutes"

或者@DavidArenburg建议我们也可以使用match代替grep

 wordsplit[match("every", wordsplit) + 1:2]

如何在R中提取子字符串

2 个答案: