Question

说我有一个字符串：

＆＃34;数据库服务由于电路板中的猴子而崩溃，这是一个严重的问题。＆＃34;

我如何提取，例如，由于＆＃39;

后面的短语＆＃39;

所以我会得到这个：

电路板上的猴子

Answer 1

这种修补方式怎么样？

v <- "database service crashed due to monkeys in the circuit board and this is a serious problem."
unlist(strsplit(unlist(strsplit(v, "due to"))[2], " "))[2:6]
[1] "monkeys" "in"      "the"     "circuit" "board"

Answer 2

不清楚你是想要一个字符串作为输出还是每个单词的字符串，但假设你想要一个字符串，如果x是输入字符串，那么这个sub将会这样做：

s <- sub(".*due to ((\\w+ ){4}\\w+).*", "\\1", x)

，并提供：

> s
[1] "monkeys in the circuit board"

以下是正则表达式的可视化：

.*due to ((\w+ ){4}\w+).*

Regular expression visualization

Debuggex Demo

如果你想要单独的话，那么

strsplit(s, " ")[[1]]

，并提供：

[1] "monkeys" "in"      "the"     "circuit" "board"

Answer 3

这是另一种方法。它优于RStudent提取后面的五个重要单词＆＃34;由于＆＃34;，但它创造了一个奇怪的词干结果。我怀疑也可以解决。当然，这两行可以合并。

text <- "database service crashed due to monkeys in the circuit board and this is a serious problem." 
text.short <- unlist(str_split(text, "due to"))
five <- str_extract_all(text.short[2], "(\\w){5}")

[1] "monke" "circu" "board" "serio" "probl"

提取R中特定单词后面的文本

3 个答案: