HTMLpurifier正在剥离表。有人知道为什么吗?

时间:2018-12-11 08:58:53

标签: htmlpurifier

我刚刚下载了HTML净化器,以清理所见即所得的编辑器输入,但似乎正在删除表格。

如果我输入此文本:

<font face="Times New Roman" size="3">

</font><p style="margin: 0in 0in 0pt; line-height: 150%; mso-outline-level: 3;"><span style='color: black; line-height: 150%; font-family: "Arial","sans-serif"; font-size: 12pt; mso-ascii-theme-font: minor-bidi; mso-hansi-theme-font: minor-bidi; mso-bidi-font-family: Arial; mso-bidi-theme-font: minor-bidi;'>Recruitment methods</span></p><font face="Times New Roman" size="3">

</font><table style="border: currentColor; border-image: none; border-collapse: collapse; mso-border-alt: solid windowtext .5pt; mso-yfti-tbllook: 1184; mso-padding-alt: 0in 5.4pt 0in 5.4pt;" border="1" cellspacing="0" cellpadding="0"><font face="Times New Roman" size="3">
</font><tbody><tr style="mso-yfti-irow: 0; mso-yfti-firstrow: yes;"><font face="Times New Roman" size="3">
 </font><td width="37" style="padding: 0in 5.4pt; border: 1pt solid windowtext; border-image: none; width: 27.95pt; background-color: transparent; mso-border-alt: solid windowtext .5pt;"><font face="Times New Roman" size="3">
 </font><p align="center" style="margin: 0in 0in 0pt; text-align: center; line-height: normal;"><span style='font-family: "Arial","sans-serif"; mso-ascii-theme-font: minor-bidi; mso-hansi-theme-font: minor-bidi; mso-bidi-font-family: Arial; mso-bidi-theme-font: minor-bidi;'><font size="3">No.</font></span></p><font face="Times New Roman" size="3">
 </font></td><font face="Times New Roman" size="3">
 </font><td width="180" style="border-width: 1pt 1pt 1pt 0px; border-style: solid solid solid none; border-color: windowtext windowtext windowtext rgb(0, 0, 0); padding: 0in 5.4pt; border-image: none; width: 134.95pt; background-color: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt;">&nbsp;</td><font face="Times New Roman" size="3">
 </font><td width="210" style="border-width: 1pt 1pt 1pt 0px; border-style: solid solid solid none; border-color: windowtext windowtext windowtext rgb(0, 0, 0); padding: 0in 5.4pt; border-image: none; width: 157.5pt; background-color: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt;">&nbsp;</td><font face="Times New Roman" size="3">
 </font><td width="211" style="border-width: 1pt 1pt 1pt 0px; border-style: solid solid solid none; border-color: windowtext windowtext windowtext rgb(0, 0, 0); padding: 0in 5.4pt; border-image: none; width: 2.2in; background-color: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt;">&nbsp;</td><font face="Times New Roman" size="3">
</font></tr><font face="Times New Roman" size="3">
</font><tr style="mso-yfti-irow: 1;"><font face="Times New Roman" size="3">
 </font><td width="37" style="border-width: 0px 1pt 1pt; border-style: none solid solid; border-color: rgb(0, 0, 0) windowtext windowtext; padding: 0in 5.4pt; border-image: none; width: 27.95pt; background-color: transparent; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt;"><font face="Times New Roman" size="3">
 </font><p align="center" style="margin: 0in 0in 0pt; text-align: center; line-height: normal;"><span style='font-family: "Arial","sans-serif"; mso-ascii-theme-font: minor-bidi; mso-hansi-theme-font: minor-bidi; mso-bidi-font-family: Arial; mso-bidi-theme-font: minor-bidi;'><font size="3">1</font></span></p><font face="Times New Roman" size="3">
 </font></td><font face="Times New Roman" size="3">
 </font><td width="180" style="border-width: 0px 1pt 1pt 0px; border-style: none solid solid none; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); padding: 0in 5.4pt; width: 134.95pt; background-color: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt;">&nbsp;</td><font face="Times New Roman" size="3">
 </font><td width="210" style="border-width: 0px 1pt 1pt 0px; border-style: none solid solid none; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); padding: 0in 5.4pt; width: 157.5pt; background-color: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt;">&nbsp;</td><font face="Times New Roman" size="3">
 </font><td width="211" style="border-width: 0px 1pt 1pt 0px; border-style: none solid solid none; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); padding: 0in 5.4pt; width: 2.2in; background-color: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt;">&nbsp;</td><font face="Times New Roman" size="3">
</font></tr><font face="Times New Roman" size="3">
</font><font face="Times New Roman" size="3">
</font></tbody></table><font face="Times New Roman" size="3">

</font><p align="center" style="margin: 0in 0in 10pt; text-align: center;"><span style='line-height: 115%; font-family: "Arial","sans-serif"; font-size: 12pt; mso-ascii-theme-font: minor-bidi; mso-hansi-theme-font: minor-bidi; mso-bidi-font-family: Arial; mso-bidi-theme-font: minor-bidi;'>&nbsp;</span></p><font face="Times New Roman" size="3">

</font><p style="margin: 0in 0in 10pt;"><font face="Times New Roman" size="3">

</font><br>

我得到以下输出:

    <font face="Times New Roman" size="3">

</font><p style="margin:0in 0in 0pt;line-height:150%;"><span style="color:#000000;line-height:150%;font-family:Arial, 'sans-serif';font-size:12pt;">Recruitment methods</span></p><font face="Times New Roman" size="3">

</font><font face="Times New Roman" size="3">
 </font><font face="Times New Roman" size="3">
  </font><font face="Times New Roman" size="3">
  </font><p align="center" style="margin:0in 0in 0pt;text-align:center;line-height:normal;"><span style="font-family:Arial, 'sans-serif';"><font size="3">No.</font></span></p><font face="Times New Roman" size="3">
  </font><font face="Times New Roman" size="3">
  </font><font face="Times New Roman" size="3">
  </font><p align="center" style="margin:0in 0in 0pt;text-align:center;line-height:normal;"><span style="font-family:Arial, 'sans-serif';"><font size="3">Method</font></span></p><font face="Times New Roman" size="3">
  </font><font face="Times New Roman" size="3">
  </font><font face="Times New Roman" size="3">
  </font><p align="center" style="margin:0in 0in 0pt;text-align:center;line-height:normal;"><span style="font-family:Arial, 'sans-serif';"><font size="3">Strengths</font></span></p><font face="Times New Roman" size="3">
  </font><font face="Times New Roman" size="3">
  </font><font face="Times New Roman" size="3">
  </font><p align="center" style="margin:0in 0in 0pt;text-align:center;line-height:normal;"><span style="font-family:Arial, 'sans-serif';"><font size="3">Weaknesses</font></span></p><font face="Times New Roman" size="3">
  </font><font face="Times New Roman" size="3">
 </font><font face="Times New Roman" size="3">
 </font><font face="Times New Roman" size="3">
  </font><font face="Times New Roman" size="3">
  </font><p align="center" style="margin:0in 0in 0pt;text-align:center;line-height:normal;"><span style="font-family:Arial, 'sans-serif';"><font size="3">1</font></span></p><font face="Times New Roman" size="3">
  </font><font face="Times New Roman" size="3">
  </font><font face="Times New Roman" size="3">
  </font><p style="margin:0in 0in 0pt;line-height:normal;"><span style="font-family:Arial, 'sans-serif';"><font size="3">Internal recruitment</font></span></p><font face="Times New Roman" size="3">
  </font><p style="margin:0in 0in 0pt;line-height:normal;"><span style="font-family:Arial, 'sans-serif';"><font size="3">Promotion</font></span></p><font face="Times New Roman" size="3">
  </font><p style="margin:0in 0in 0pt;line-height:normal;"><span style="font-family:Arial, 'sans-serif';"><font size="3">Lateral transfer</font></span></p><font face="Times New Roman" size="3">
  </font><p style="margin:0in 0in 0pt;line-height:normal;"><span style="font-family:Arial, 'sans-serif';"><font size="3"> </font></span></p><font face="Times New Roman" size="3">
  </font><font face="Times New Roman" size="3">
  </font><font face="Times New Roman" size="3"> etc...

我的设置如下:

require_once 'purify/library/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Allowed', 'p,span[style|class],a[href|title],abbr[title],acronym[title],b,strong,blockquote[cite],code,em,i,iframe[src|width|height],img[alt|title|class|src|height|width],h1,h2,h3,h3,ol,ul,li,table[class|style],tr,td,hr');
$purifier = new HTMLPurifier($config);

我只添加了HTML.Allowed行,以尝试专门允许表格,但这没用。有人知道为什么它不应该剥离表吗?

谢谢

1 个答案:

答案 0 :(得分:0)

有点奇怪-最初我以为<font>标签(一个内联元素)中的一个包围着一个块级元素,因此迫使它被剥离,然后错误从那里层叠,但是在通过基本的(哑)HTML格式器运行代码之后,看起来它们都是相当独立的。

但是打开错误收集可以向我们显示发生了什么。问题似乎在于,尽管HTML Purifier具有独立性,但一旦遇到第一个<table>,它就会关闭<font>标记,但是它并没有删除<font>(可能会认为) :

  

通知 第7行,第8列:

从第6行开始,由

自动关闭      

错误 第10行,第8列:

上的样式属性已删除

     

通知 第11行,第12列:

从第10行开始,由

自动关闭      

通知 第11行,第12列:

从第9行开始,由

自动关闭      

警告 第31行,第8列: 删除了不必要的标记

     

错误 第34行,第8列:

上的样式属性已删除

     

通知 第35行,第12列:

从第34行开始,由

自动关闭      

警告 第55行,第8列: 删除了不必要的

标记

     

警告 第59行,第8列: 删除了不必要的

标签

     

警告 第60行,第4列: 删除了不必要的

标记

     

通知 文档结尾:

标记从第66行开始,由文档结尾关闭

     

警告 文档结尾:

节点的内容已重新组织以实施其内容模型

如果选择 CollectErrors (收集​​错误):并插入以下HTML,则这是the demo的输出:

<font face="Times New Roman" size="3">
</font>
<p style="margin: 0in 0in 0pt; line-height: 150%; mso-outline-level: 3;"><span style='color: black; line-height: 150%; font-family: "Arial","sans-serif"; font-size: 12pt; mso-ascii-theme-font: minor-bidi; mso-hansi-theme-font: minor-bidi; mso-bidi-font-family: Arial; mso-bidi-theme-font: minor-bidi;'>Recruitment methods</span></p>
<font face="Times New Roman" size="3">
</font>
<table style="border: currentColor; border-image: none; border-collapse: collapse; mso-border-alt: solid windowtext .5pt; mso-yfti-tbllook: 1184; mso-padding-alt: 0in 5.4pt 0in 5.4pt;" border="1" cellspacing="0" cellpadding="0">
    <font face="Times New Roman" size="3">
    </font>
    <tbody>
    <tr style="mso-yfti-irow: 0; mso-yfti-firstrow: yes;">
        <font face="Times New Roman" size="3">
        </font>
        <td width="37" style="padding: 0in 5.4pt; border: 1pt solid windowtext; border-image: none; width: 27.95pt; background-color: transparent; mso-border-alt: solid windowtext .5pt;">
            <font face="Times New Roman" size="3">
            </font>
            <p align="center" style="margin: 0in 0in 0pt; text-align: center; line-height: normal;"><span style='font-family: "Arial","sans-serif"; mso-ascii-theme-font: minor-bidi; mso-hansi-theme-font: minor-bidi; mso-bidi-font-family: Arial; mso-bidi-theme-font: minor-bidi;'><font size="3">No.</font></span></p>
            <font face="Times New Roman" size="3">
            </font>
        </td>
        <font face="Times New Roman" size="3">
        </font>
        <td width="180" style="border-width: 1pt 1pt 1pt 0px; border-style: solid solid solid none; border-color: windowtext windowtext windowtext rgb(0, 0, 0); padding: 0in 5.4pt; border-image: none; width: 134.95pt; background-color: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt;">&nbsp;</td>
        <font face="Times New Roman" size="3">
        </font>
        <td width="210" style="border-width: 1pt 1pt 1pt 0px; border-style: solid solid solid none; border-color: windowtext windowtext windowtext rgb(0, 0, 0); padding: 0in 5.4pt; border-image: none; width: 157.5pt; background-color: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt;">&nbsp;</td>
        <font face="Times New Roman" size="3">
        </font>
        <td width="211" style="border-width: 1pt 1pt 1pt 0px; border-style: solid solid solid none; border-color: windowtext windowtext windowtext rgb(0, 0, 0); padding: 0in 5.4pt; border-image: none; width: 2.2in; background-color: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt;">&nbsp;</td>
        <font face="Times New Roman" size="3">
        </font>
    </tr>
    <font face="Times New Roman" size="3">
    </font>
    <tr style="mso-yfti-irow: 1;">
        <font face="Times New Roman" size="3">
        </font>
        <td width="37" style="border-width: 0px 1pt 1pt; border-style: none solid solid; border-color: rgb(0, 0, 0) windowtext windowtext; padding: 0in 5.4pt; border-image: none; width: 27.95pt; background-color: transparent; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt;">
            <font face="Times New Roman" size="3">
            </font>
            <p align="center" style="margin: 0in 0in 0pt; text-align: center; line-height: normal;"><span style='font-family: "Arial","sans-serif"; mso-ascii-theme-font: minor-bidi; mso-hansi-theme-font: minor-bidi; mso-bidi-font-family: Arial; mso-bidi-theme-font: minor-bidi;'><font size="3">1</font></span></p>
            <font face="Times New Roman" size="3">
            </font>
        </td>
        <font face="Times New Roman" size="3">
        </font>
        <td width="180" style="border-width: 0px 1pt 1pt 0px; border-style: none solid solid none; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); padding: 0in 5.4pt; width: 134.95pt; background-color: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt;">&nbsp;</td>
        <font face="Times New Roman" size="3">
        </font>
        <td width="210" style="border-width: 0px 1pt 1pt 0px; border-style: none solid solid none; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); padding: 0in 5.4pt; width: 157.5pt; background-color: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt;">&nbsp;</td>
        <font face="Times New Roman" size="3">
        </font>
        <td width="211" style="border-width: 0px 1pt 1pt 0px; border-style: none solid solid none; border-color: rgb(0, 0, 0) windowtext windowtext rgb(0, 0, 0); padding: 0in 5.4pt; width: 2.2in; background-color: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt;">&nbsp;</td>
        <font face="Times New Roman" size="3">
        </font>
    </tr>
    <font face="Times New Roman" size="3">
    </font><font face="Times New Roman" size="3">
    </font>
    </tbody>
</table>
<font face="Times New Roman" size="3">
</font>
<p align="center" style="margin: 0in 0in 10pt; text-align: center;"><span style='line-height: 115%; font-family: "Arial","sans-serif"; font-size: 12pt; mso-ascii-theme-font: minor-bidi; mso-hansi-theme-font: minor-bidi; mso-bidi-font-family: Arial; mso-bidi-theme-font: minor-bidi;'>&nbsp;</span></p>
<font face="Times New Roman" size="3">
</font>
<p style="margin: 0in 0in 10pt;"><font face="Times New Roman" size="3">
</font><br>

another thread on the HTML Purifier forum可能会使这更容易理解。症状描述如下:

  

当我尝试净化此代码时:

<table>
  <tr>
    <td>
      <li>fffff</li>
    </td>
  </tr>
</table>
     

我得到:

<table>
  <tr>
    <td>
    </td>
  </tr>
</table>

fffff

然后(我,嘿)回覆:

  

我想这是HTML Purifier检测到无法在该位置打开

  • 的原因-而不是先剥离
  • ,它会在该点自动关闭其他打开的标签,从而(最初)在:

    <table>
      <tr>
        <td>
        </td>
      </tr>
    </table>
    <li>fffff</li>
        </td>
      </tr>
    </table>
    
         

    然后删除无关的结束标记...

    <table>
      <tr>
        <td>
        </td>
      </tr>
    </table>
    <li>fffff</li>
    
         

    然后剥离

  • ,从而观察到以下情况:

    <table>
      <tr>
        <td>
        </td>
      </tr>
    </table>
    fffff
    
  • 您可以尝试切换the Lexer to DirectLex,看看是否会改变行为,但我对此表示怀疑-您可能会被这种行为所困扰。不过,请旋转一下。