按内容删除HTML标记

时间:2014-02-04 11:32:37

标签: php html dom

我在程序的输出中有这个表(在PHP中用DomDocument转换的字符串):

<table>
    <tr>
        <td width="50">Â </td>
        <td>My content</td>
        <td width="50">Â </td>
    </tr>
<table>

我需要删除两个标记<td width="50">Â </td>(我不知道为什么程序会添加它们,但有-.-“)就像这样:

<table>
    <tr>
        <td>My content</td>
    </tr>
<table>

在PHP中使用它的最佳方法是什么?

修改: 该程序是JasperReport Server。我通过Web应用程序调用报表呈现功能:

//this is the call to server library for generate the report
$reportGen = $reportServer->runReport($myReport);

$domDoc = new \DomDocument();
$domDoc->loadHTML($reportGen);
return $domDoc->saveHTML($domDoc->getElementsByTagName('table')->item(0));

返回我需要修复的上层表...

2 个答案:

答案 0 :(得分:1)

试试这个

<?php
    $domDoc = new DomDocument();
    $domDoc->loadHTML($reportGen);
    $xpath = new DOMXpath($domDoc);
    $tags = $xpath->query('//td');
    foreach($tags as $tag) {
        $value = $tag->nodeValue;
        if(preg_match('/^(Â )/',$value))
        $tag->parentNode->removeChild($tag);
    }
?>

答案 1 :(得分:0)

正则表达式并替换:

$var = '<table>
    <tr>
        <td width="50">Ã</td>
        <td>My interssing content</td>
        <td width="50">Ã</td>
    </tr>
<table>';

$final = preg_replace('#(<td width="50".*?>).*?(</td>)#', '$1$2', $var);
$final = str_replace('<td width="50"></td>', '', $final);

echo $final;