解析XML文件时引发过早的文件异常结束

时间:2013-03-02 06:23:23

标签: java xml-parsing domparser

我正在阅读来自服务器的大量XML文件,解析它们并从每个文件中提取一些标签以存储在数据库中。在读取这些XML文件的过程中,DOM解析器有时会抛出此异常:

Caused by: org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.

我希望能够处理此异常并继续解析'错误'文件并从中获取数据。我从服务器返回了大约2000个XML文件,其中~100个文件触发了这个异常。奇怪的是,当我手动检查XML文件时,其中的所有标签都是完美排列的。

这是我的代码:

for (int k = 0; k < listdata.length; k++) {
    String xmldata = listdata[k].getcategorydata();
    System.out.println("XML File:" + xmldata.toString());

    try {
        DocumentBuilderFactory factory = DocumentBuilderFactory
                .newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();

        File is = new File(xmldata);
        //InputSource is = new InputSource(new StringReader(xmldata));

        Document doc = builder.parse(is);
        Element docEle = doc.getDocumentElement();
        System.out.println("Root element of the document: "
                + docEle.getNodeName());
        NodeList links = docEle.getElementsByTagName("Some tag Name");
        System.out.println("Total actionLink: " + links.getLength());

        if (links != null && links.getLength() > 0) {
            for (int l = 0; l < links.getLength(); l++) {
                Node node = links.item(l);

                if (node.getNodeType() == Node.ELEMENT_NODE) {
                    System.out.println("=====================");

                    Element e = (Element) node;
                    NodeList nodeList = e
                            .getElementsByTagName("Tag Name..");
                    pathname = nodeList.item(0).getChildNodes().item(0)
                            .getNodeValue();
                    System.out.println("Name: " + pathname);
                }
            }
        }
    } catch (SAXParseException e) {
        System.out.println("Error" + e.getSystemId());
        e.printStackTrace();
    }
}

0 个答案:

没有答案