Question

我从Pubmed中检索了一个.XML文件。现在，我想在文件的每一行找到两个不同的“字符串/单词”如果找到'字符串/单词'，那么之前想要检索匹配的'字符串/单词'和'n'个字符比赛结束后。

例如，如果要搜索的字符串是以下行中的“字符串”，并且我想在匹配字符串之前和之后有10个字符。

“字符串的其余部分实际上非常有用”

我应该得到;

“文件的st是实际的”

Answer 1

你可以只是＆＃34;垫＆＃34;你的正则表达式告诉它在

之前和之后抓住10个字符

x <- "The rest of the string is actually really useful"
stringr::str_extract(x, ".{0,10}string.{0,10}")
# [1] "st of the string is actual"

.代表任何字符，而{0,10}代表最多匹配10个字符（所以如果你要抓住＃34;休息＆＃34;那不是10个整体左边的字符，它仍然会匹配）。

Answer 2

您可以使用regmatches

 regmatches(x,regexpr(".{1,10}string.{1,10}",x))
[1] "st of the string is actual"