Question

好的，我正在使用（PHP）file_get_contents来阅读一些网站，这些网站只有一个Facebook的链接...在我得到整个网站后，我想找到完整的网址为facebook

所以在某些方面会有：

<a href="http://facebook.com/username" >

我想得到http://facebook.com/username，我的意思是从第一个（“）到最后一个（”）。用户名是变量的...可以是username.somethingelse，我可以在“href”之前或之后有一些属性。

以防我不是很清楚：

<a href="http://facebook.com/username" >  //I want http://facebook.com/username
<a href="http://www.facebook.com/username" >  //I want http://www.facebook.com/username
<a class="value" href="http://facebook.com/username. some" attr="value" >  //I want http://facebook.com/username. some

或上面的所有示例，都可以使用单引号

<a href='http://facebook.com/username' > //I want http://facebook.com/username

感谢所有

Answer 1

不要在HTML上使用正则表达式。它是一把霰弹枪，会在某些时候吹掉你的腿。改为使用DOM：

$dom = new DOMDocument;
$dom->loadHTML(...);
$xp = new DOMXPath($dom);

$a_tags = $xp->query("//a");
foreach($a_tags as $a) {
   echo $a->getAttribute('href');
}

Answer 2

我建议使用DOMDocument来达到这个目的，而不是使用正则表达式。以下是针对您的案例的快速代码示例：

$dom = new DOMDocument();
$dom->loadHTML($content);

// To hold all your links...
$links = array();

$hrefTags = $dom->getElementsByTagName("a");
    foreach ($hrefTags as $hrefTag)
       $links[] = $hrefTag->getAttribute("href");

print_r($links); // dump all links

使用Regex提取整个网址内容

2 个答案: