Question

我的输入向量如下：

input <- c("fdsfs iwantthis (1,1,1,1) fdsaaa   iwantthisaswell (2,3,4,5)", "fdsfs thistoo (1,1,1,1)")

我想使用正则表达式提取以下内容：

> output
[1] "iwantthis iwantthisaswell" "thistoo"

我设法提取了在方括号之前的每个单词。我尝试这样做只是为了得到第一个字：

> gsub(".*?[[:space:]](.*?)[[:space:]]\\(.*", "\\1", input)
[1] "iwantthis" "thistoo"

但是我无法让它多次出现：

    > gsub(".*?[[:space:]](.*?)[[:space:]]\\(.*?[[:space:]](.*?)[[:space:]]\\(.*", "\\1 \\2", input)
[1] "iwantthis iwantthisaswell" "fdsfs thistoo (1,1,1,1)"

我管理的最接近的是以下内容：

library(stringr)
> str_extract_all(input, "(\\S*)\\s\\(")
[[1]]
[1] "iwantthis ("       "iwantthisaswell ("

[[2]]
[1] "thistoo ("

我确定我的正则表达式中缺少某些内容（不是那么好），但是呢？

Answer 1

您可以使用

> sapply(str_extract_all(input, "\\S+(?=\\s*\\()"), paste, collapse=" ")
[1] "iwantthis iwantthisaswell" "thistoo"

请参见regex demo。 \\S+(?=\\s*\\()将从(字符前面加上0+空格之前的文本中提取所有1+非空格块。 sapply与paste会将找到的匹配项与空格（collapse=" "）连接起来。

模式详细信息

\S+-1个或多个非空白字符
(?=\s*\()-正向超前（(?=...)），需要先存在0+个空格字符（\s*），然后再存在(个字符（\( ）立即显示在当前位置的右侧。

Answer 2

这里是使用base R

的选项

unlist(regmatches(input, gregexpr("\\w+(?= \\()", input, perl = TRUE)))
#[1] "iwantthis"       "iwantthisaswell" "thistoo"

Answer 3

这在R中有效：

gsub('\\w.+? ([^\\s]+) \\(.+?\\)','\\1', input, perl=TRUE)

结果：

[1] "iwantthis iwantthisaswell" "thistoo"

已更新，适用于一般情况。例如。现在通过搜索其他匹配项之间的非空格来找到“ i_wantthisaswell2”。

使用其他建议的一般情况输入：

general_cases <- c("fdsfs iwantthis (1,1,1,1) fdsaaa   iwantthisaswell (2,3,4,5)", 
                   "fdsfs thistoo (1,1,1,1) ",
                   "GaGa iwant_this (1,1,1,1)", 
                   "lal2!@#$%^&*()_+a i_wantthisaswell2 (2,3,4,5)")
gsub('\\w.+? ([^\\s]+) \\(.+?\\)','\\1', general_cases, perl=TRUE)

结果：

[1] "iwantthis iwantthisaswell" "thistoo "                 
[3] "iwant_this"                "i_wantthisaswell2"

提取模式正则表达式的两个出现

3 个答案: