" http *"正则表达式确实匹配URL

时间:2014-09-29 17:09:40

标签: c# regex

我有以下代码使用C#Regex查找所有" http:// ....."从我的输入。 这是我的代码,但我找不到任何东西。请告诉我我错过了什么?

 Match m = Regex.Match(input, "http* ");
 while (m.Success)
 {
   Console.WriteLine("'{0}' found at index {1}.",
     m.Value, m.Index);
   m = m.NextMatch();
 }

这是我的输入文字(为便于阅读而包装):

I recently moved and have a buI recently moved and have a bunch of stuff for sale.
Most prices are based on my research from CL and ebay. Let me know or make an offer
if you like something from the list. Thanks. IKEA RAMBERG bed frame and Sultan
mattress - $150 http://seattle.craigslist.org/est/fuo/4688883554.html Sanus Platinum
Foundations TV Stand - $75 http://seattle.craigslist.org/est/fuo/4687613962.html
Staples Mission Coffee table and 2 sets of nesting/side tables - $90
http://seattle.craigslist.org/est/fuo/4687499215.html
Like new Hoover SteamVac Carpet Cleaner with Clean Surge, F5914900 - $100
http://seattle.craigslist.org/est/hsh/4687474666.html Hauppauge WinTV-HVR-1600
ATSC/NTSC/QAM Tuner Video Card + Remote - $35
http://seattle.craigslist.org/est/sop/4687372003.html Computer with core 2 quad, 2GB
RAM, nforce MB, 1.5TB HDD and more - $200 http://seattle.craigslist.org/est/sys/4687362266.html
LINKSYS CM100 Cable Modem (works with Comcast) - $15
http://seattle.craigslist.org/est/ele/4687639722.html Various computer parts for sale - $1 I
recently moved and have a buI recently moved and have a bunch of stuff for sale. Most prices
are based on my research from CL and ebay. Let me know or make an offer if you like something
from the list. Thanks. IKEA RAMBERG bed frame and Sultan mattress - $150
http://seattle.craigslist.org/est/fuo/4688883554.html Sanus Platinum Foundations TV Stand - $75
http://seattle.craigslist.org/est/fuo/4687613962.html Staples Mission Coffee table and 2 sets of
nesting/side tables - $90 http://seattle.craigslist.org/est/fuo/4687499215.html Like new Hoover
SteamVac Carpet Cleaner with Clean Surge, F5914900 - $100
http://seattle.craigslist.org/est/hsh/4687474666.html Hauppauge WinTV-HVR-1600 ATSC/NTSC/QAM
Tuner Video Card + Remote - $35 http://seattle.craigslist.org/est/sop/4687372003.html Computer
with core 2 quad, 2GB RAM, nforce MB, 1.5TB HDD and more - $200
http://seattle.craigslist.org/est/sys/4687362266.html LINKSYS CM100 Cable Modem (works with
Comcast) - $15 http://seattle.craigslist.org/est/ele/4687639722.html Various computer parts for
sale - $1 "

4 个答案:

答案 0 :(得分:1)

问题是你在*的{​​{1}}后面放了一个星号p,所以你可能的匹配是这样的:

"http* "

等等。由于输入字符串中htt http httpp httppp httpppp 之后没有空格,因此表达式不会得到任何匹配。

此表达式应匹配:

p

Match m = Regex.Match(input, "http\\S* "); 表示“任何非空白字符”)。

答案 1 :(得分:1)

对于初学者,请查看Stack Overdlow上的上一个答案。 What is the best regular expression to check if a string is a valid URL?

看来你误解了正则表达式中的*含义。

"http* "

表示htt后跟0或更多p后跟空格。

*不是DOS或UNIX shell中的通配符fileglob。

正则表达式中的

*表示其后跟的零个或多个令牌(在本例中为p

为了您的输入,您可以写:

https?://(\S*)

\ S捕获所有非空间 ?使s可选,所以你也可以抓住https

但对于任意输入,空格并不总是跟随URL的唯一内容。它可以用引号字符串括起来,例如,在HTML或Javascript中。以下内容应允许使用后跟空格或非转义引号的URL。

https?://([^ "']*)

在[]开头使用^表示模式是独占模式(除了这些字符之外的任何东西),很多时候是编写模式的最简单方法。另一种方法是编写一个完全包容的模式,这意味着你必须为你希望处理的每个合法输入制定一个模式。

我不记得合规网址的实际正则表达式,这不是一件容易的事,但你可以在Google或Stack Overflow上找到一些。只是为了一般的想法,我可能会写下以下内容作为包容性模式:

https?://([-+a-zA-Z0-9._&?]*)

如下面Lukos的评论所述,请记住C#逃离。我通常在C#中使用逐字符串来表示正则表达式。

var pattern = @"https?://\S*";

答案 2 :(得分:0)

您的源代码正在寻找匹配此模式

"http* "

表示查找序列htt,然后是零个或多个字符p,后跟一个文字空格('')字符。您可以尝试匹配"http:[^\s]*",它与文字文本http:匹配,后跟零个或多个非空白字符。

答案 3 :(得分:0)

在选择要使用的正则表达式之前,有一个重要的问题。您想要找到任何看似URL的内容(可能以http或https开头),还是只想匹配有效的URL?一个有效的URL正则表达式是非常复杂的,一个基本的正则更容易,但你冒险收集文本中非URL的匹配或可能是无效的看起来像真正的URL!

相关问题