使用xpath丢弃CDATA的SimpleXMLElement

时间:2013-07-08 02:16:34

标签: php xml json xpath simplexml

我需要递归地将XML的节点转换为json字符串。我大部分都是

$sku = "AC2061414";
$dom = new SimpleXMLElement(file_get_contents( "/usr/share//all_products.xml" )); 
$query = '//sku[text() = "'.$sku.'"]';
$entries = $dom->xpath($query);

foreach ($entries as $entry) {

    $parent_div = $entry->xpath( 'parent::*' );
    $nodearray=array();

    foreach($parent_div as $node) {
        if ($node->nodeType == XML_CDATA_SECTION_NODE) {
            $nodearray[$node->getName()]=$node->textContent;
        }else{
            $nodearray[$node->getName()]=$node;
        }
    }
    $ajax = json_encode( $nodearray );
    print($ajax);
}

运行

<?xml version="1.0" encoding="UTF-8"?>
<products>
   <product active="1" on_sale="0" discountable="1">
    <sku>AC2061414</sku>
    <name><![CDATA[ALOE CADABRA ORGANIC LUBE PINA COLADA 2.5OZ]]></name>
    <description><![CDATA[ text text ]]></description>
    <keywords/>
    <price>7.45</price>
    <stock_quantity>30</stock_quantity>
    <reorder_quantity>0</reorder_quantity>
    <height>5.25</height>
    <length>2.25</length>
    <diameter>0</diameter>
    <weight>0.27</weight>
    <color></color>
    <material>aloe vera, vitamin E</material>
    <barcode>826804006358</barcode>
    <release_date>2012-07-26</release_date>
    <images>
      <image>/AC2061414/AC2061414A.jpg</image>
    </images>
    <categories>
      <category code="528" video="0" parent="0">Lubricants</category>
      <category code="531" video="0" parent="528">Flavored</category>
      <category code="28" video="0" parent="25">Oral Products</category>
      <category code="532" video="0" parent="528">Natural</category>
    </categories>
    <manufacturer code="AC" video="0">Aloe Cadabra Lubes</manufacturer>
    <type code="LU" video="0">Lubes</type>
  </product>
</products>

结尾
{"product":{"@attributes":{"active":"1","on_sale":"0","discountable":"1"},"sku":"AC2061414","name":{},"description":{},"keywords":{},"price":"7.45","stock_quantity":"30","reorder_quantity":"0","height":"5.25","length":"2.25","diameter":"0","weight":"0.27","color":{},"material":"aloe vera, vitamin E","barcode":"826804006358","release_date":"2012-07-26","images":{"image":"\/AC2061414\/AC2061414A.jpg"},"categories":{"category":["Lubricants","Flavored","Oral Products","Natural"]},"manufacturer":"Aloe Cadabra Lubes","type":"Lubes"}}

除了丢失的CDATA节点值外,这似乎没问题。我确实试图解释它,但它无法正常工作。这里的诀窍是什么?

2 个答案:

答案 0 :(得分:1)

您可以尝试将LIBXML_NOCDATA选项添加到构造函数中。

$dom = new SimpleXMLElement(file_get_contents( "/usr/share//all_products.xml" ), LIBXML_NOCDATA);
...

更多详情here

答案 1 :(得分:1)

你在这里遇到的问题是因为json_encode,它根据魔术界面处理你所拥有的单一元素。例如,请参阅序列化@attributes。并且还跳过所有child-cdata-nodes,因为在魔术模式中读取元素值(比较simplexmlelements的print_rvar_dump输出)时,这些都被删除。

因为CDATA nodes can be normalized into surrounding text or just into common text-nodes,SimpleXML提供了LIBXML_NOCDATA option(在实例化时使用newsimplexml_load_*函数)来完成以下操作:将这些CDATA节点转换为文本节点和将这些文本节点合并到周围的文本节点(如果有的话)(“将CDATA合并为文本节点”)。

这将使print_rjson_encode然后将节点值作为字符串@attributes返回,因为现在它是节点值。 "PHP, SimpleXML, decoding entities in CDATA"已经详细解释了这一点。

接下来,还有另一个误解,你可以从中获得很大的好处。甚至你的代码已经包含了通过属性值选择元素的xpath,你对它的父代更直接感兴趣。然后,SimpleXML将提供已经迭代的所有子项。同样适用于json_encode的SimpleXML的神奇属性。比较这可以让你减少代码:

$xml = simplexml_load_file("/usr/share/all_products.xml", NULL, LIBXML_NOCDATA); 

// NOTE: Prevent XPath Injection by not allowing " (or ') for 
//       SKU value (validate it against a whitelist of allowed
//       characters for example)
$sku   = "AC2061414";
$query = sprintf('(//sku[text() = "%s"])[1]/..', $sku); 

$products = $xml->xpath($query);

if ($products) {
    echo json_encode(["product" => $products[0]]);
}

See the Demo

这应该可以为您提供相同的输出,而无需实际编写那么多代码。创建SimpleXMLElement时请参阅LIBXML_NOCDATA选项以及直接查询(第一个)sku元素的父(<product>)节点的修改过的xpath查询。 json_encode然后由于对它提供的魔法属性的共同遍历而照顾所有孩子。

参见:

相关问题