Question

我使用的是一个URL正则表达式（并经常使用）。它找到了各种URL格式和http协议。也就是说，如果在Dandyland都是花花公子，我就不会在这里写作。

我遇到了一个打嗝，我现在正在使用下面的正则表达式。

在字符串中搜索URL时，如果字符串包含example...see之类的字符串，则会将其视为URL。可以有任意数量的句点，但它只会在最后一个句点之后拉出最后3个字符。

有任何想法如何解决这个问题？

示例：

$string = "Here's a url, hello.com. But this...shouldn't show.";

$url_regex = "/((https?|ftp)\:\/\/)?([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?([a-z0-9-.]*)\.([a-z]{2,3})(\:[0-9]{2,5})?(\/([a-z0-9+\$_\-~@\(\)\%]\.?)+)*\/?(\?[a-z+&\$_.-][a-z0-9;:@&#%=+\/\$_.-]*)?(#[a-z_.-][a-z0-9+\$_.-]*)?/i";

preg_match_all($url_regex, $string, $urls);

return $urls;

Answer 1

这里的问题是您在允许的字符中添加了一个句点，这意味着可能有多个连续句点。当您处理内联搜索时，\b也很重要。

\b((https?|ftp)\:\/\/)?([a-z0-9+!*(),;?&=\$_-]+(\:[a-z0-9+!*(),;?&=\$_-]+)?@)?([a-z0-9-]*)\.([a-z]+){2,3}(\:[0-9])?(\/([a-z0-9+\$_\-~@\(\)\%]?)+)*\/?(\?[a-z+&\$_-][a-z0-9;:@&#%=+\/\$_-]*)?(#[a-z_-][a-z0-9+\$_-]*)?\b

Regular expression visualization

Debuggex Demo

编辑：更新了忽略匹配的答案，例如example.c

Answer 2

以下代码解决您的问题。我在最后测试。

$string = "Here's a url, hello.com. But this...shouldn't show.";

$url_regex = "/((https?|ftp)\:\/\/)?([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?([a-z0-9-]+?)\.([a-z]{2,3})(\:[0-9]{2,5})?(\/([a-z0-9+\$_\-~@\(\)\%]\.?)+)*\/?(\?[a-z+&\$_.-][a-z0-9;:@&#%=+\/\$_.-]*)?(#[a-z_.-][a-z0-9+\$_.-]*)?/i";

preg_match_all($url_regex, $string, $urls);

Answer 3

在字符串中使用https和http和urls。

$string = "this is my website http://example.com and this is my friend website https://pqr.com etc, this...shouldn't show";

$regex = '/\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$]/i';

preg_match_all($regex, $string, $matches);

print_r($matches[0]);

URL正则表达式问题，php

3 个答案: