HTML Agility包正在将无效标记上的<p> </p>标记更改为<p>

时间:2017-06-26 18:30:27

标签: c# html html-agility-pack xmlconvert

输入:

<head><title>Title</title></head>
<font face="Verdana" size="2">
<p>

<b>Bold sentence.</b>
<br><br>Sentence after two  breaks.<br><br>Sentence after another two  breaks. <b><i>bold and italicized sentence.</i></b> sentence. <br><br>final sentence after two more breaks.

</font></p>

<form><center><div style='padding-left: 16px; padding-right: 16px;'><a class='button' href='javascript:void(0);' onclick='javascript:window.close()'><img src='/GBUIAssets/Web20/img/frame/buttonshade.png' alt='buttonShade' /><span class='roundLeft'><span class='roundRight'>Fermer</span></span></a></div></center></form></font>

我删除头部,字体和表格。我得到的输出是:

<p>

<b>Bold sentence.</b>
<br><br>Sentence after two  breaks.<br><br>Sentence after another two  breaks. <b><i>bold and italicized sentence.</i></b> sentence. <br><br>final sentence after two more breaks.

<p>

这是有问题的,因为我之后尝试将其转换为xml,这会引发错误。为什么它&#34;修复&#34;我的代码的一部分已经有效了吗?什么可能导致它的想法?如果需要,我可以提供更多代码,但我只想先确定没有明显的步骤我没有。

编辑:为了完整的背景,我正在剥离html的身体内容。 Catch是,这个HTML是HIDEOUS。真的病态格式化了。我将它加载到xml中以抛出html doc错误的特定错误,然后将其吐入错误报告中,以查找每个未能剥离的文件

2 个答案:

答案 0 :(得分:0)

Marpup无效。尝试将字体标记放在P标记的旁边,你应该没问题。

答案 1 :(得分:0)

将您的标记更新为:

<head>
  <title>Title</title>
</head>
<font face="Verdana" size="2">
<p>

<b>Bold sentence.</b>
<br/><br/>Sentence after two  breaks.<br/><br/>Sentence after another two  breaks. <b><i>bold and italicized sentence.</i></b> sentence. <br/><br/>final sentence after two more breaks.

</p>

<form>
<center>
<div style='padding-left: 16px; padding-right: 16px;'>
<a class='button' href='javascript:void(0);' onclick='javascript:window.close()'>
<img src='/GBUIAssets/Web20/img/frame/buttonshade.png' alt='buttonShade' />
<span class='roundLeft'><span class='roundRight'>Fermer</span></span>
</a>
</div>
</center>
</form>
</font>

如果可能,我建议将<font>声明移到外部样式表中,例如

body { font-face: Verdana; }