使用XPath获取所有节点,直到再次遇到第一个节点的类型

时间:2014-08-08 03:14:03

标签: php xpath

我有一些第三方XML,看起来像这样:

<body>
  <text>Unimportant Introduction</text>
  <text class="heading">Important Section 1</text>
  <text>Important text</text>
  <table>(Table data)</table>
  <text>Other important text</text>
  <text class="heading">Important Section 2</text>
  <text class="heading"></text>
  <text>Important text</text>
  <text>Other important text</text>
  <text class="heading">Important Section 3</text>
  <text>Important text</text>
  <table>(Table data)</table>
</body>

我想要的是从非空<text class="heading">开始的所有节点,但在另一个非空<text class="heading">之前停止。最后一个<text class="heading">捕获<body>中剩余的节点非常重要,所以这样的事情(不一定要精确):

array(
  0 => DOMNodeList {
    <text class="heading">Important Section 1</text>
    <text>Important text</text>
    <table>(Table data)</table>
    <text>Other important text</text>
  },
  1 => DOMNodeList {
    <text class="heading">Important Section 2</text>
    <text class="heading"></text>
    <text>Important text</text>
    <text>Other important text</text>
  },
  2 => DOMNodeList {
    <text class="heading">Important Section 3</text>
    <text>Important text</text>
    <table>(Table data)</table>
  }
)

如果我不能在一个XPath中执行此操作(分离和分组子项),那么循环也可以。

我已经可以找到<text class="heading">//body/text[@class=\'heading\' and string-length(text()) > 0]节点,但我不知道如何添加所有兄弟节点。

编辑:

我刚才意识到我真正想要的更像是这样:

array(
  0 => DOMElement {
    <body>
      <text class="heading">Important Section 1</text>
      <text>Important text</text>
      <table>(Table data)</table>
      <text>Other important text</text>
    </body>
  },
  1 => DOMElement {
    <body>
      <text class="heading">Important Section 2</text>
      <text class="heading"></text>
      <text>Important text</text>
      <text>Other important text</text>
    </body>
  },
  2 => DOMElement {
    <body>
      <text class="heading">Important Section 3</text>
      <text>Important text</text>
      <table>(Table data)</table>
    </body>
  }
)

<body>节点内拥有所有必需的节点非常有用!

1 个答案:

答案 0 :(得分:0)

以下代码在循环中执行我想要的操作:

<?php
$xml = <<<EOT
<body>
  <text>Unimportant Introduction</text>
  <text class="heading">Important Section 1</text>
  <text>Important text 1</text>
  <table>(Table data) 1</table>
  <text>Other important text 1</text>
  <text class="heading">Important Section 2</text>
  <text class="heading"></text>
  <text>Important text 2</text>
  <text>Other important text 2</text>
  <text class="heading">Important Section 3</text>
  <text>Important text 3</text>
  <table>(Table data) 3</table>
</body>
EOT;

$dom = new DOMDocument();
$dom->loadXML($xml);

$finder = new DOMXPath($dom);
$heading = "text[@class='heading' and string-length(text()) > 0]";
$nodes = $finder->query("//body/{$heading}");
$num_sections = $nodes->length;

for ($num = 1; $num <= $num_sections; ++$num) {
  // Find all nodes that match the nth heading or any nodes after the nth heading
  // (nth heading plus all following nodes)
  $node_set1 = "(//body/{$heading}[{$num}] | //body/{$heading}[{$num}]/following-sibling::*)";
  // Find the next heading after the nth heading (if it exists) or any nodes after that
  // (n+1-th heading plus all following nodes)
  $node_set2 = "//body/{$heading}[{$num}]/following-sibling::{$heading}[1] | //body/{$heading}[{$num}]/following-sibling::{$heading}[1]/following-sibling::*";

  // Find all nodes that are in the first set but not in the second set
  $nodes = $finder->query("{$node_set1}[count(.| {$node_set2})!=count({$node_set2})]");

  print("Section $num:<br/>\n");
  foreach ($nodes as $node) {
    $sx = simplexml_import_dom($node);
    var_dump($sx->asXML());
  }
}
?>

输出(使用Xdebug):

Section 1:
string '<text class="heading">Important Section 1</text>' (length=48)
string '<text>Important text 1</text>' (length=29)
string '<table>(Table data) 1</table>' (length=29)
string '<text>Other important text 1</text>' (length=35)
Section 2:
string '<text class="heading">Important Section 2</text>' (length=48)
string '<text class="heading"/>' (length=23)
string '<text>Important text 2</text>' (length=29)
string '<text>Other important text 2</text>' (length=35)
Section 3:
string '<text class="heading">Important Section 3</text>' (length=48)
string '<text>Important text 3</text>' (length=29)
string '<table>(Table data) 3</table>' (length=29)

我不知道这是否是最简单的解决方案,但它可以解决问题!