Question

我正在努力寻找从（很长）字符串中提取多个网址的最佳解决方案。

以下是示例文本：

miserie <- "some text /Home/123/home-name/Specs some other text http://www.example.com/Specs some other text /Home/456/home-name/Specs"

修改：更新的示例：

miserie <- "/Home/homes?query=123 qdf /Home/123/home-name/Specs , homeurl : http://www.example.com/ },{ id :1, y : 02 , p :false, url : /Home/456/home-name/Specs"

这是我想要的结果：

[1] "/Home/123/home-name/Specs"
[2] "/Home/456/home-name/Specs"

本质上，我需要一个可靠的解决方案来提取所有以“ / Home”开始并以“ / Specs”结束的路径。

我尝试了以下模式：

pat <- ".*(/Home/.*/Specs).*"

以及以下功能：

str_match_all(miserie,pat)
gsub(x=miserie, pattern=pat, replace="\\1")

第一个返回此结果：

[[1]]
     [,1]                                                                                                                     
[1,] "some text /Home/123/home-name/Specs some other text http://www.example.com/Speccs some other text /Home/456/home-name/Specs"
     [,2]                       
[1,] "/Home/456/home-name/Specs"

第二个仅返回最后一个URL：

[1] "/Home/456/home-name/Specs"

有什么建议吗？

Answer 1

我们可以尝试通过以下正则表达式模式使用gregexpr和regmatches：

(?<!\\S)/Home(/[^/\\s]+)*/Specs

示例脚本：

miserie <- "some text /Home/123/home-name/Specs some other text http://www.example.com/Specs some other text /Home/456/home-name/Specs"
regmatches(miserie, gregexpr("(?<!\\S)/Home(/[^/\\s]+)*/Specs", miserie, perl=TRUE))

[[1]]
[1] "/Home/123/home-name/Specs" "/Home/456/home-name/Specs"

以下是正则表达式模式的使用说明：

(?<!\\S)       assert that what precedes is either whitespace or
               the start of the string
/Home          match /Home
(/[^/\\s]+)*   optionally match zero or more other components
/Specs         ending in Specs

Answer 2

您可以使用：

stringr::str_match_all(miserie,".*?(/Home/.*?/Specs).*?")[[1]][,2]
#[1] "/Home/123/home-name/Specs" "/Home/456/home-name/Specs"

使用?可以使模式延迟匹配尽可能少的字符。

从字符串中提取多个路径

2 个答案: