使用Regex提取url链接

时间:2018-01-17 17:09:00

标签: c# regex

如何使用Regex在以下内容中提取 url 标记之间的所有链接:

/* cyrillic-ext */
@font-face {
  font-family: 'Montserrat';
  font-style: normal;
  font-weight: 400;
  src: local('Montserrat Regular'), local('Montserrat-Regular'), url(https://fonts.gstatic.com/s/montserrat/v12/rBHvpRWBkgyW99dXT88n7yEAvth_LlrfE80CYdSH47w.woff2) format('woff2');
  unicode-range: U+0460-052F, U+1C80-1C88, U+20B4, U+2DE0-2DFF, U+A640-A69F, U+FE2E-FE2F;
}
/* cyrillic */
@font-face {
  font-family: 'Montserrat';
  font-style: normal;
  font-weight: 400;
  src: local('Montserrat Regular'), local('Montserrat-Regular'), url(https://fonts.gstatic.com/s/montserrat/v12/NX1NravqaXESu9fFv7KuqiEAvth_LlrfE80CYdSH47w.woff2) format('woff2');
  unicode-range: U+0400-045F, U+0490-0491, U+04B0-04B1, U+2116;
}
/* vietnamese */
@font-face {
  font-family: 'Montserrat';
  font-style: normal;
  font-weight: 400;
  src: local('Montserrat Regular'), local('Montserrat-Regular'), url(https://fonts.gstatic.com/s/montserrat/v12/SKK6Nusyv8QPNMtI4j9J2yEAvth_LlrfE80CYdSH47w.woff2) format('woff2');
  unicode-range: U+0102-0103, U+0110-0111, U+1EA0-1EF9, U+20AB;
}
/* latin-ext */
@font-face {
  font-family: 'Montserrat';
  font-style: normal;
  font-weight: 400;
  src: local('Montserrat Regular'), local('Montserrat-Regular'), url(https://fonts.gstatic.com/s/montserrat/v12/gFXtEMCp1m_YzxsBpKl68iEAvth_LlrfE80CYdSH47w.woff2) format('woff2');
  unicode-range: U+0100-024F, U+0259, U+1E00-1EFF, U+20A0-20AB, U+20AD-20CF, U+2C60-2C7F, U+A720-A7FF;
}
/* latin */
@font-face {
  font-family: 'Montserrat';
  font-style: normal;
  font-weight: 400;
  src: local('Montserrat Regular'), local('Montserrat-Regular'), url(https://fonts.gstatic.com/s/montserrat/v12/zhcz-_WihjSQC0oHJ9TCYPk_vArhqVIZ0nv9q090hN8.woff2) format('woff2');
  unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6, U+02DA, U+02DC, U+2000-206F, U+2074, U+20AC, U+2122, U+2212, U+2215;
}

它应该返回一个链接列表:

...

由于

3 个答案:

答案 0 :(得分:2)

string content = "...";
Regex regex = new Regex(@"url\((https?://.+?)\)");
foreach (Match match in regex.Matches(content))
{
    Console.WriteLine(match.Groups[1].Value);
}

答案 1 :(得分:2)

您可以使用此代码:

var content = "..."; // your input here
var regex = new Regex("url\\((?<url>[^\\)]+)");
var urls = regex.Matches(content).Cast<Match>()
 .Select(m => m.Groups["url"].Value)
 .Distinct()
 .ToArray();

正则表达式解释:

url // match "url" literaly
\( // match open brace
(?<url> // named capture group
[^\)]+ // match all chars until close brace
) // close capture group

答案 2 :(得分:1)

试试JSON.stringify(contacts);。这将捕获从左括号开始到第一个右括号的所有内容。 url\((.*?)\)表示模式将捕获满足模式的最小文本,而不是所有文本。

捕获的URL将出现在第一个捕获组中,例如:

.*?

var regex=new Regex(@"url\((.*?)\)");
var urls= ( from match in regex.Matches(input).Cast<Match>()
            select match.Groups[1].Value
          ).Distinct().ToArray();