使用preg_match_all从字符串中提取Image SRC

时间:2012-09-16 21:38:25

标签: php regex preg-match-all

我有一个数据字符串设置为$ content,此数据的示例如下

This is some sample data which is going to contain an image in the format <img src="http://www.randomdomain.com/randomfolder/randomimagename.jpg">.  It will also contain lots of other text and maybe another image or two.

我试图抓住<img src="http://www.randomdomain.com/randomfolder/randomimagename.jpg">并将其保存为另一个字符串,例如$ extracted_image

到目前为止,我有这个......

if( preg_match_all( '/<img[^>]+src\s*=\s*["\']?([^"\' ]+)[^>]*>/', $content, $extracted_image ) ) {
$new_content .= 'NEW CONTENT IS '.$extracted_image.'';

它返回的全部是......

NEW CONTENT IS Array

我意识到我的尝试可能完全错误,但有人可以告诉我哪里出错了吗?

3 个答案:

答案 0 :(得分:1)

您的第一个问题是http://php.net/manual/en/function.preg-match-all.php将数组放入$matches,因此您应该从数组中输出单个项目。尝试$extracted_image[0]开始。

答案 1 :(得分:1)

如果您只想要一个结果,则需要使用其他功能:

preg_match()返回第一个也是唯一一个匹配。 preg_match_all()返回包含所有匹配项的数组。

答案 2 :(得分:0)

使用正则表达式解析有效的html是不明智的。由于src属性之前可能有意外的属性,因为非img标签可以将正则表达式欺骗成假阳性匹配,并且由于属性值可以用单引号或双引号引起来,因此您应该使用dom解析器。它干净,可靠且易于阅读。

代码:(Demo

$string = <<<HTML
This is some sample data which is going to contain an image
in the format <img src="http://www.randomdomain.com/randomfolder/randomimagename.jpg">.
It will also contain lots of other text and maybe another image or two
like this: <img alt='another image' src='http://www.example.com/randomfolder/randomimagename.jpg'>
HTML;

$srcs = [];
$dom=new DOMDocument;
$dom->loadHTML($string);
foreach ($dom->getElementsByTagName('img') as $img) {
    $srcs[] = $img->getAttribute('src');
}

var_export($srcs);

输出:

array (
  0 => 'http://www.randomdomain.com/randomfolder/randomimagename.jpg',
  1 => 'http://www.example.com/randomfolder/randomimagename.jpg',
)