Question

我想为url创建一个Regex，以便从输入字符串中获取所有链接。正则表达式应该识别以下格式的URL地址：

HTTP（S）：//www.webpage.com
http（s）：//webpage.com
www.webpage.com

还有更复杂的网址： - http://www.google.pl/#sclient=psy&hl=pl&site=&source=hp&q=regex+url&pbx=1&oq=regex+url&aq=f&aqi=g1&aql=&gs_sm=e&gs_upl=1582l3020l0l3199l9l6l0l0l0l0l255l1104l0.2.3l5l0&bav=on.2,or.r_gc.r_pw.&fp=30a1604d4180f481&biw=1680&bih=935

我有以下一个

((www\.|https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)

但它无法识别以下模式：www.webpage.com。有人可以帮我创建一个合适的正则表达式吗？

修改它应该可以找到一个合适的链接，而且还可以在适当的索引中放置一个链接，如下所示：

private readonly Regex RE_URL = new Regex(@"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)", RegexOptions.Multiline);
foreach (Match match in (RE_URL.Matches(new_text)))
            {
                // Copy raw string from the last position up to the match
                if (match.Index != last_pos)
                {
                    var raw_text = new_text.Substring(last_pos, match.Index - last_pos);
                    text_block.Inlines.Add(new Run(raw_text));
                }

                // Create a hyperlink for the match
                var link = new Hyperlink(new Run(match.Value))
                {
                    NavigateUri = new Uri(match.Value)
                };
                link.Click += OnUrlClick;

                text_block.Inlines.Add(link);

                // Update the last matched position
                last_pos = match.Index + match.Length;
            }

Answer 1

我不知道为什么你的匹配结果只有http://，但我清理了你的正则表达式

((?:(?:https?|ftp|gopher|telnet|file|notes|ms-help):(?://|\\\\)(?:www\.)?|www\.)[\w\d:#@%/;$()~_?\+,\-=\\.&]+)

(?:)是非捕获组，这意味着只剩下一个捕获组，其中包含完整匹配的字符串。

[\w\d:#@%/;$()~_?\+,\-=\\.&]我在列表中添加了一个逗号（否则您的长示例不匹配）转义了-（您创建了一个字符范围）并且未转义.（不需要）在一个角色类。

请参阅此here on Regexr，这是测试正则表达式的有用工具。

但网址匹配不是一项简单的任务，请see this question here

Answer 2

我刚刚撰写了一篇关于识别大多数使用格式的网址的博文，例如：

www.google.com http://www.google.com mailto:somebody@google.com somebody@google.com www.url-with-querystring.com/?url=has-querystring

使用的正则表达式是/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)/但是我建议您到http://blog.mattheworiordan.com/post/13174566389/url-regular-expression-for-links-with-or-without-the查看完整的工作示例以及正则表达式的解释，以防需要扩展或调整它。< / p>

Answer 3

你给的正则表达式不适用于www。地址，因为它期望URI scheme（URL之前的位，如http：//）。 'www。'你的正则表达式中的部分不起作用，因为它只匹配www.://（这是无意义的）

尝试这样的事情：

(((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+)|(www\.)[\w\d:#@%/;$()~_?\+-=\\\.&]*)

这将匹配具有有效URI方案的内容，或以“www。”开头的内容。

用于识别URL的正则表达式

3 个答案: