以下xpath查询没有任何结果
$url="https://example.com";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$html = curl_exec($ch);
curl_close($ch);
/* Use internal libxml errors -- turn on in production, off for debugging */
libxml_use_internal_errors(true);
/* Createa a new DomDocument object */
$dom = new DomDocument;
/* Load the HTML */
@$dom->loadHTMLFile($html);
/* Create a new XPath object */
$xpath = new DomXPath($dom);
/* Query all <td> nodes containing specified class name */
$nodes = $xpath->query('//img[@class="info_flag"]/@alt');
/* Traverse the DOMNodeList object to output each DomNode's nodeValue */
foreach ($nodes as $node) {
echo $node."\n";
}
在执行print_r时,它输出一个空数组。我已将用户代理用作403阻止的远程站点。
答案 0 :(得分:0)
您需要使用DomDocument::loadHtml
而不是loadHtmlFile
。由于DOM节点无法转换为字符串,因此还要打印$node->nodeValue
。
/* Use internal libxml errors -- turn on in production, off for debugging */
libxml_use_internal_errors(true);
/* Createa a new DomDocument object */
$dom = new DomDocument;
/* Load the HTML */
$a = $dom->loadHTML($html);
/* Create a new XPath object */
$xpath = new DomXPath($dom);
/* Query all <td> nodes containing specified class name */
$nodes = $xpath->query('//img[@class="info_flag"]/@alt');
/* Traverse the DOMNodeList object to output each DomNode's nodeValue */
foreach ($nodes as $node) {
echo $node->nodeValue."\n";
}