仅匹配特定域

时间:2016-10-12 13:05:28

标签: regex text regex-negation regex-greedy

我收到了以下字符串:

<a href="/web/20120412083942/http://test.com/contact">Contact Us</a>  |    <a href="/web/20120412083942/https://test.com/privacy-policy">Privacy Policy</a>  <br /><br />
<a href="/web/20120412083942/http://www.cassandracastanedaphoto.com/index2.php#/home/">Photography by Cassandra Castenada</a></span><!-- Start Shareaholic TopSharingBar Automatic --><!-- End Shareaholic TopSharingBar Automatic --><script src="/web/20120412083942js_/http://www.test.com/wp-content/plugins/tweetmeme/button.js" type="text/javascript"></script>
<!-- tracker added by Ultimate Google Analytics plugin v1.6.0: /web/20120412083942/http://www.oratransplant.nl/uga -->

我想要匹配:

  

/网络/ 20120412083942 / http://test.com

     

/网络/ 20120412083942 / https://test.com

     

/网络/ 20120412083942js _ / http://www.test.com

基本上任何有网页/ [数字] [潜在字符串] / http://test.com

的网址

到目前为止,这是我的正则表达式:

((http(s)?:\/\/)?web.archive.org)?\/web\/\d+.*?\/http(s)?:\/\/(www\.)?test\.com

问题是,它与整个部分匹配:

  

/网络/ 20120412083942 / http://www.cassandracastanedaphoto.com/index2.php#/home/“&GT;摄影   作者:Cassandra Castenadahttp://test.com

我怎样才能让它停止照顾域名没有以test.com开头?

1 个答案:

答案 0 :(得分:1)

我成功使用了这个正则表达式模式:

Pattern: /web/[^/]+/http[s]{0,1}://(|www\.)test\.com/?[._a-zA-Z-0-9]+

Options: ^ and $ match at line breaks

Match the characters “/web/” literally «/web/»
Match any character that is NOT a “/” «[^/]+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the characters “/http” literally «/http»
Match the character “s” «[s]{0,1}»
   Between zero and one times, as many times as possible, giving back as needed (greedy) «{0,1}»
Match the characters “://” literally «://»
Match the regular expression below and capture its match into backreference number 1 «(|www\.)»
   Match either the regular expression below (attempting the next alternative only if this one fails) «»
      Empty alternative effectively makes the group optional (following alternatives will be tried if the regex backtracks into the group) «|»
   Or match regular expression number 2 below (the entire group fails if this one fails to match) «www\.»
      Match the characters “www” literally «www»
      Match the character “.” literally «\.»
Match the characters “test” literally «test»
Match the character “.” literally «\.»
Match the characters “com” literally «com»
Match the character “/” literally «/?»
   Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match a single character present in the list below «[._a-zA-Z-0-9]+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
   One of the characters “._” «._»
   A character in the range between “a” and “z” «a-z»
   A character in the range between “A” and “Z” «A-Z»
   The character “-” «-»
   A character in the range between “0” and “9” «0-9»
相关问题