正则表达式,抓取链接

时间:2013-02-18 08:59:07

标签: php expression

我有html文档和链接,例如http://example.com/some?b=20130218082149&a=20130218092249&info=1152&hash=079caaaae2fcd602c6d8dx

我需要一个正则表达式才能找到它,如果你能向我解释这个表达,那真的很有用。

1 个答案:

答案 0 :(得分:0)

我刺伤了

<?php

$pattern = "#^https?://([a-z0-9-]+\.)*blah\.com(/.*)?$#";

$tests = array(
    'http://blah.com/so/this/is/good'
  , 'http://blah.com/so/this/is/good/index.html'
  , 'http://www.blah.com/so/this/is/good/mice.html#anchortag'
  , 'http://anysubdomain.blah.com/so/this/is/good/wow.php'
  , 'http://anysubdomain.blah.com/so/this/is/good/wow.php?search=doozy'
  , 'http://any.sub-domain.blah.com/so/this/is/good/wow.php?search=doozy' // I added this case
  , 'http://999.sub-domain.blah.com/so/this/is/good/wow.php?search=doozy' // I added this case
  , 'http://obviousexample.com'
  , 'http://bbc.co.uk/blah.com/whatever/you/get/the/idea'
  , 'http://blah.com.example'
  , 'not/even/a/blah.com/url'
);

foreach ( $tests as $test )
{
  if ( preg_match( $pattern, $test ) )
  {
    echo $test, " <strong>matched!</strong><br>";
  } else {
    echo $test, " <strong>did not match.</strong><br>";
  }
}

//  Here's another way
echo '<hr>';
foreach ( $tests as $test )
{
  if ( $filtered = filter_var( $test, FILTER_VALIDATE_URL ) )
  {
    $host = parse_url( $filtered, PHP_URL_HOST );
    if ( $host && preg_match( "/blah\.com$/", $host ) )
    {
      echo $filtered, " <strong>matched!</strong><br>";
    } else {
      echo $filtered, " <strong>did not match.</strong><br>";
    }
  } else {
    echo $test, " <strong>did not match.</strong><br>";
  }
}