XPath没有按预期工作[php]

时间:2017-04-29 10:56:51

标签: php xpath

我经常使用XPath和php来解析页面, 但这次我不明白这个特定页面的行为与下面的代码,我希望你能帮助我。

我用来解析此页面的代码http://www.jeuxvideo.com/recherche.php?m=9&t=10&q=Call+of+duty

<?php
$What = 'Call of duty';
$What = urlencode($What);
$Query = 'http://www.jeuxvideo.com/recherche.php?m=9&t=10&q='.$What;

$ch = curl_init();     
curl_setopt($ch, CURLOPT_URL, $Query);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
$response = curl_exec($ch);
curl_close($ch);

/*
$search = array("<article", "</article>");
$replace = array("<div", "</div>");
$response = str_replace($search, $replace, $response);
*/

$dom = new DOMDocument();
@$dom->loadHTML($response);

$xpath = new DOMXPath($dom);

$elements = $xpath->query('//article[@class="recherche-aphabetique-item"]/a');

//$elements = $xpath->query('//div[@class="recherche-aphabetique-item"]/a');

count($elements);

var_dump($elements);
?>

小提琴测试它: http://phpfiddle.org/main/code/r9n6-d0j0

我只想在“文章”节点中获取所有“a”节点,其中包含“recherche-aphabetique-item”类。

但它没有给我任何回报:/。

正如你在注释代码中看到的那样,我试图将html5元素文章替换为div,但我也有同样的行为。

感谢你的帮助。

1 个答案:

答案 0 :(得分:1)

我看到很多DOMDocument::loadHTML(): Unexpected end tag错误 - 你应该使用libxml的内部错误处理函数来帮助解决这个问题。此外,当我查看远程站点的DOM时,我看不到任何与XPath查询匹配的a标记,只有span个标记

<?php
$What = 'Call of duty';
$What = urlencode($What);
$Query = 'http://www.jeuxvideo.com/recherche.php?m=9&t=10&q='.$What;

$ch = curl_init();     
curl_setopt($ch, CURLOPT_URL, $Query);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
$response = curl_exec($ch);
curl_close($ch);

/* try to suppress errors using libxml */
libxml_use_internal_errors( true );

$dom = new DOMDocument();

/* additional flags for DOMDocument */
$dom->validateOnParse=false;
$dom->standalone=true;
$dom->strictErrorChecking=false;
$dom->recover=true;
$dom->formatOutput=false;

@$dom->loadHTML($response);

libxml_clear_errors();

$xpath = new DOMXPath($dom);

$elements = $xpath->query('//article[@class="recherche-aphabetique-item"]/span');

count( $elements );
var_dump( $elements );
?>

输出

object(DOMNodeList)#97 (1) { ["length"]=> int(94) } 

您可以尝试进一步简化:

$What = 'Call of duty';
$What = urlencode($What);
$Query = 'http://www.jeuxvideo.com/recherche.php?m=9&t=10&q='.$What;

libxml_use_internal_errors( true );
$dom = new DOMDocument();
$dom->validateOnParse=false;
$dom->standalone=true;
$dom->strictErrorChecking=false;
$dom->recover=true;
$dom->formatOutput=false;
@$dom->loadHTMLFile($Query);
libxml_clear_errors();

$xpath = new DOMXPath($dom);

$elements = $xpath->query('//article[@class="recherche-aphabetique-item"]/span');
count($elements);
foreach( $elements as $node )echo $node->nodeValue,'<br />';