XML解析错误补救措施

时间:2013-04-10 03:25:42

标签: xml parsing

在解析XML文件时,我面临解析错误,如[Fatal Error] :293:24: Invalid byte 2 of 2-byte UTF-8 sequence.我的XML示例包含一些字符,如 xc3 ,这是一个字符(我的意思是按下删除按钮一次< strong> xc3 一次删除字符。(我尝试在此处粘贴此字符,但此编辑器显示其他字符)。

<?xml version="1.0" encoding="utf-8"?>
<issue-info>
<issue-meta>
<date>January 24, 2013</date>
<from>Chris Burton, John Wiley &amp; Sons, Ltd.</from>
<journal>Greenhouse Gases: Science and Technology</journal>
<typesetter>Anju Upadhaya</typesetter>
<volume>3</volume>
<issue>1</issue>
<printer>Markono,</printer>
<cover-date>February 2013</cover-date>
<online-issn>2152-3878</online-issn>
<print-issn>2152-3878</print-issn>
<total-pages>FM &ndash; 4; TEXT &ndash; 95; EM &ndash; 1: TOTAL = 100</total-pages>
<spl-instruction></spl-instruction>
</issue-meta>
<issue-item>
<seq>1</seq>
<ed-ref>OFC</ed-ref>
<aid></aid>
<author></author>
<description>Update from GHG 2_1 cover</description>
<start-page>1</start-page>
<end-page>1</end-page>
<artty></artty>
<category>OFC (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online></pub-online>
</issue-item>
<issue-item>
<seq>2</seq>
<ed-ref>IFC</ed-ref>
<aid></aid>
<author>49379Ůpdf</author>
<description>New GHG colour ADVERT</description>
<start-page>2</start-page>
<end-page>2</end-page>
<artty></artty>
<category>IFC (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online></pub-online>
</issue-item>
<issue-item>
<seq>3</seq>
<ed-ref>FM1</ed-ref>
<aid></aid>
<author></author>
<description>Table of Contents</description>
<start-page>1</start-page>
<end-page>1</end-page>
<artty></artty>
<category>TOC (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online></pub-online>
</issue-item>
<issue-item>
<seq>4</seq>
<ed-ref>FM2</ed-ref>
<aid></aid>
<author></author>
<description>Editorial Board</description>
<start-page>2</start-page>
<end-page>2</end-page>
<artty></artty>
<category>Editorial Board (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online></pub-online>
</issue-item>
<issue-item>
<seq>5</seq>
<ed-ref>FM3</ed-ref>
<aid></aid>
<author></author>
<description>Aims and Scope</description>
<start-page>3</start-page>
<end-page>3</end-page>
<artty></artty>
<category>Aims and Scope (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online></pub-online>
</issue-item>
<issue-item>
<seq>6</seq>
<ed-ref>FM4</ed-ref>
<aid></aid>
<author></author>
<description>Information Page</description>
<start-page>4</start-page>
<end-page>4</end-page>
<artty></artty>
<category>Information Page (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online></pub-online>
</issue-item>
<issue-item>
<seq>7</seq>
<ed-ref></ed-ref>
<aid>GHG1333</aid>
<author>PROD ED</author>
<description></description>
<start-page>1</start-page>
<end-page>2</end-page>
<artty>ED</artty>
<category>Editorial (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online></pub-online>
</issue-item>
<issue-item>
<seq>8</seq>
<ed-ref></ed-ref>
<aid>GHG1334</aid>
<author>PROD ED</author>
<description></description>
<start-page>3</start-page>
<end-page>4</end-page>
<artty>XX</artty>
<category>60 Second Interview (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online></pub-online>
</issue-item>
<issue-item>
<seq>9</seq>
<ed-ref></ed-ref>
<aid>GHG1335</aid>
<author></author>
<description></description>
<start-page>5</start-page>
<end-page>7</end-page>
<artty>XX</artty>
<category>Feature (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online></pub-online>
</issue-item>
<issue-item>
<seq>10</seq>
<ed-ref>GHG-12-0029.R2</ed-ref>
<aid>GHG1313</aid>
<author>PAN, CLODIC, TOUBASSY</author>
<description></description>
<start-page>8</start-page>
<end-page>20</end-page>
<artty>XX</artty>
<category>In the Field (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online>03 Jan 2013</pub-online>
</issue-item>
<issue-item>
<seq>11</seq>
<ed-ref>GHG-12-0023.R1</ed-ref>
<aid>GHG1298</aid>
<author>Peterson, O'Byrne, Endres, Peterson</author>
<description></description>
<start-page>21</start-page>
<end-page>29</end-page>
<artty>XX</artty>
<category>Spotlight (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online>14 Sep 2012</pub-online>
</issue-item>
<issue-item>
<seq>12</seq>
<ed-ref>GHG-12-0033.R2</ed-ref>
<aid>GHG1321</aid>
<author>Begag, Krutka, Dong, Mihalcik, Rhine, Gould, Baldic, Nahass</author>
<description></description>
<start-page>30</start-page>
<end-page>39</end-page>
<artty>XX</artty>
<category>Spotlight (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online></pub-online>
</issue-item>
<issue-item>
<seq>13</seq>
<ed-ref>GHG-12-0036.R1</ed-ref>
<aid>GHG1331</aid>
<author>Cunningham, Lauchnor, Eldring, Esposito, Mitchell, Gerlach, Phillips, Ebigbo, Spangler</author>
<description></description>
<start-page>40</start-page>
<end-page>49</end-page>
<artty>XX</artty>
<category>Spotlight (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online></pub-online>
</issue-item>
<issue-item>
<seq>14</seq>
<ed-ref>GHG-12-0034.R1</ed-ref>
<aid>GHG1328</aid>
<author>Elliot, Buscheck, Celia</author>
<description></description>
<start-page>50</start-page>
<end-page>65</end-page>
<artty>XX</artty>
<category>Modeling and Analysis (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online></pub-online>
</issue-item>
<issue-item>
<seq>15</seq>
<ed-ref>GHG-12-0021.R1</ed-ref>
<aid>GHG1318</aid>
<author>Mazzoldi, Picard, Sriram, Oldenburg</author>
<description></description>
<start-page>66</start-page>
<end-page>83</end-page>
<artty>XX</artty>
<category>Modeling and Analysis (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online>03 Jan 2013</pub-online>
</issue-item>
<issue-item>
<seq>16</seq>
<ed-ref>GHG-12-0031.R1</ed-ref>
<aid>GHG1308</aid>
<author>Eccles, Pratson</author>
<description></description>
<start-page>84</start-page>
<end-page>95</end-page>
<artty>XX</artty>
<category>Modeling and Analysis (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online>26 Oct 2012</pub-online>
</issue-item>
<issue-item>
<seq>17</seq>
<ed-ref>EM1</ed-ref>
<aid></aid>
<author>Join the SCI</author>
<description>NEW COLOUR ADVERT</description>
<start-page></start-page>
<end-page></end-page>
<artty></artty>
<category>Society Ad (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online></pub-online>
</issue-item>
<issue-item>
<seq>18</seq>
<ed-ref>IBC</ed-ref>
<aid></aid>
<author>ONLINE OPEN</author>
<description>COLOUR ADVERT</description>
<start-page>1</start-page>
<end-page>1</end-page>
<artty></artty>
<category>IBC (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online></pub-online>
</issue-item>
<issue-item>
<seq>19</seq>
<ed-ref>OBC</ed-ref>
<aid></aid>
<author>CCUS</author>
<description>NEW COLOUR ADVERT</description>
<start-page>2</start-page>
<end-page>2</end-page>
<artty></artty>
<category>OBC (GHG)</category>
<toc-category></toc-category>
<reprint></reprint>
<color>N</color>
<color-charge>0</color-charge>
<pub-online></pub-online>
</issue-item>
</issue-info>

我的解析Java代码如下所示。

       DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
        docBuilderFactory.setValidating(false);
        docBuilderFactory.setCoalescing(false);
        docBuilderFactory.setXIncludeAware(false);
        docBuilderFactory.setNamespaceAware(false);
        docBuilderFactory.setIgnoringComments(true);
        docBuilderFactory.setExpandEntityReferences(false);

        DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
        Document doc = docBuilder.parse(rtfXmlIS);
        doc.getDocumentElement().normalize();

如何消除此类错误([Fatal Error] :16:45: Invalid byte 1 of 1-byte UTF-8 sequence.[Fatal Error] :14:24: The entity "ndash" was referenced, but not declared.)?

1 个答案:

答案 0 :(得分:3)

这是两个不同的错误。

第一个是因为输入不是UTF-8。在将输入传递给解析器之前,您需要正确解码输入。

第二个可能是因为输入是XHTML而不是XML。如果要对此输入使用XML解析器并解析&ndash;之类的实体,则需要提供定义它的DTD以及输入中包含的任何其他实体。