DOMXpath / DOMDocument - 如何使用简单文本解析HTML dom元素

时间:2015-08-08 12:59:13

标签: html dom domdocument domxpath

这是我的代码:

$url = "https://www.leaseweb.com/dedicated-servers/single-processor";

libxml_use_internal_errors(true); 
$doc = new DOMDocument();

$doc->loadHTMLFile($url);

$xpath = new DOMXpath($doc);

$n = $xpath->query('//td[@data-column-name="Model"]');
$r = $xpath->query('//td[@data-column-name="RAM"]');
$l = $xpath->query('//td[@data-column-name="Location"]');
$item = 0;
$i = 0;
foreach ($n as $entry) {
    $Name = $entry->nodeValue;
    $RAM  = $r->item($item)->nodeValue;
    $Location  = $l->item($item)->nodeValue;
    $i++;
    ?>
     <tr> <td><?PHP echo $i;?></td> <td><?PHP echo $Name;?></td> <td> <?PHP echo $RAM;?> </td> <td class="hidden-xs"><?PHP echo $Location;?> </td> <td><span class="label label-success">Configure</span></td> </tr>
    <?PHP
    $item++;
}

此代码仅向文本提供结果: 选定的td元素与data-column-name="Location"保持<span id="inside_element">Holded text</span>,而不是span,我只会收到这样的简单文字:Holded text

如何获取特定dom html元素中的HTML元素?

提前致谢!

1 个答案:

答案 0 :(得分:1)

每当您需要从特定节点获取原始HTML片段时,您必须调用DOMNode::C14N()。 此方法将节点规范化为原始HTML字符串。我们来看看这个例子:

<?php 
$html = '<html>
<head>  
</head>
<body>
    <div class="container">
        <div>
            <span>text span</span>
        </div>
    </div>
</body>
</html>';

$dom = DOMDocument::loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//div[@class="container"]/div');


print $nodes->item(0)->C14N();

由于我想在div.container > div下获取HTML内容,因此输出将为::

<div>
    <span>text span</span>
</div>

替代方法

有一种不太常规的方法可以达到相同的效果。也就是说,保存特定HTML节点的HTML,如下所示:

$node = $nodes->item(0);

print $node->ownerDocument->saveHTML($node); // equivalent: $nodes->C14N();

所以根据你的具体情况,它是这样的:

<?php 
$url = "https://www.leaseweb.com/dedicated-servers/single-processor";
$doc = new DOMDocument();
@$doc->loadHTMLFile($url);
$xpath = new DOMXPath($doc);
$l = $xpath->query('//td[@data-column-name="Location"]/div');

var_dump($l->item(0)->C14N()); 
# Or $l->item(0)->ownerDocument->saveHTML($l->item(0));