如何删除DOM元素标记但保留其内容?

时间:2016-09-04 23:37:21

标签: php html dom xpath

我有PHP代码,删除所有至少有一个属性的节点。 Here是我的代码:

<div>
    <p>These line shall stay</p>

    <p>But keep this</p>

</div>

正如您在小提琴中看到的,这是上面代码的当前输出:

<div>
    <p>These line shall stay</p>
    Remove this one
    <p>But keep this</p>
    and this
</div>

虽然这是期望的结果:

@app.route('/add-new-song',methods=['GET','POST'])
def add_new_song():
    form = NewSongForm(request.form)
    if form.validate_on_submit():
        new_song = SongBook()
        form.populate_obj(new_song)
        db.session.add(new_song)
        db.session.commit()
return render_template('add-new-song.html',form=form)

我该怎么做?

3 个答案:

答案 0 :(得分:4)

在删除元素之前,你想要拔出他们的子节点并在它后面加上它们。

实施例

$data = <<<DATA
<div>
    <p>These line shall stay</p>
    <p class="myclass">Remove this one</p>
    <p>But keep this</p>
    <div style="color: red">and this</div>
    <div style="color: red">and <p>also</p> this</div>
    <div style="color: red">and this <div style="color: red">too</div></div>
</div>
DATA;

$dom = new DOMDocument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);

foreach ($xpath->query("//*[@*]") as $node) {
    $parent = $node->parentNode;
    while ($node->hasChildNodes()) {
        $parent->insertBefore($node->lastChild, $node->nextSibling);
    }
    $parent->removeChild($node);
}

echo $dom->saveHTML();

输出:

<div>
    <p>These line shall stay</p>
    Remove this one
    <p>But keep this</p>
    and this
    and <p>also</p> this
    and this too
</div>

https://3v4l.org/9qHRM

(我添加了一些嵌套元素来证明这种方法的安全性。)

一对旁白:

  • 如果您使用其他$dom->removeChild($dom->doctype)标记加载,则不需要LIBXML_HTML_NODEFDTD
  • 您的xpath表达式可以简化为//*[@*]

答案 1 :(得分:1)

您可以将replaceChild()与该节点的文本内容结合使用:

foreach ($lines_to_be_removed as $line) {
  $line->parentNode->replaceChild($dom->createTextNode($line->textContent),$line);
}

// <div>
//   <p>These line shall stay</p>
//   Remove this one
//   <p>But keep this</p>
//   and this
// </div>

但是,对于xpath选择器和递归的//表示法,这可能会出现问题。

使用更手动的方法将目标节点的子内容复制到父节点中。

$data = '
<div>
  <div>1A</div>
  <div class="foo">1B
    <div>2C</div>
    <div class="foo">2D</div>
    <div>2E</div>
    <div class="foo">2F
      <div>3G</div>
      <div class="foo">3H</div>
    </div>
  </div>
</div>';

$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
$dom->removeChild($dom->doctype);

SomeFunctionName( $dom->documentElement );

$html = $dom->saveHTML();

function SomeFunctionName( $parent )
{
  $nodesToDelete = array();
  if( $parent->hasChildNodes() )
  {
    foreach( $parent->childNodes as $node )
    {
      SomeFunctionName( $node );
      if( $node->hasAttributes() and count( $node->attributes ) > 0 )
      {
        foreach( $node->childNodes as $childNode )
        {
          $node->parentNode->insertBefore( clone $childNode, $node );
        }
        $nodesToDelete[] = $node;
      }
    }
  }
  foreach( $nodesToDelete as $delete)
  {
    $delete->parentNode->removeChild( $delete );
  }
}

// <div>
//   <div>1A</div>
//   1B
//     <div>2C</div>
//     2D
//     <div>2E</div>
//     2F
//       <div>3G</div>
//       3H
//       <div>3I</div>
//       3J
// </div>

如果你想将子元素嵌套在一个新的&#34; div&#34;容器换掉这段代码

    foreach( $parent->childNodes as $node )
    {
      SomeFunctionName( $node );
      if( $node->hasAttributes() and count( $node->attributes ) > 0 )
      {
        $newNode = $node->ownerDocument->createElement('div');
        foreach( $node->childNodes as $childNode )
        {
          $newNode->appendChild( clone $childNode );
        }
        $node->parentNode->insertBefore( $newNode, $node );
        $nodesToDelete[] = $node;
      }
    }

// <div>
//   <div>1A</div>
//   <div>1B
//     <div>2C</div>
//     <div>2D</div>
//     <div>2E</div>
//     <div>2F
//       <div>3G</div>
//       <div>3H</div>
//       <div>3I</div>
//       <div>3J</div>
//     </div>
//   </div>
// </div>

答案 2 :(得分:1)

这将删除所有包含 class style 属性的标记,因此它不是防弹:

<?php

$data = <<<DATA
<div>
    <p>These line shall stay</p>
    <p class="myclass">Remove this one</p>
    <p>But keep this</p>
    <div style="color: red">and this</div>
</div>
DATA;

$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
$dom->removeChild($dom->doctype);

$xpath = new DOMXPath($dom);

$lines_to_be_removed = $xpath->query("//*[count(@class)>0 or count(@style)>0]");

foreach ($lines_to_be_removed as $line) {
    $line->parentNode->removeChild($line);
}

// just to check
echo $dom->saveHTML();
?>

请注意以下这一行:

 $lines_to_be_removed = $xpath->query("//*[count(@class)>0] or count(@style)>0]");