奇怪的XML缩进

时间:2013-05-20 02:47:20

标签: java xml dom transformer

我正在编写一个XML文件,并且标签出现了一些错误:

<BusinessEvents>

<MailEvent>
          <to>Wellington</to>
          <weight>10.0</weight>
          <priority>air priority</priority>
          <volume>10.0</volume>
          <from>Christchurch</from>
          <day>Mon May 20 14:30:08 NZST 2013</day>
          <PPW>8.0</PPW>
          <PPV>2.5</PPV>
     </MailEvent>
<DiscontinueEvent>
          <to>Wellington</to>
          <priority>air priority</priority>
          <company>Kiwi Co</company>
          <from>Sydney</from>
     </DiscontinueEvent>
<RoutePriceUpdateEvent>
          <weightcost>3.0</weightcost>
          <to>Wellington</to>
          <duration>15.0</duration>
          <maxweight>40.0</maxweight>
          <maxvolume>20.0</maxvolume>
          <priority>air priority</priority>
          <company>Kiwi Co</company>
          <day>Mon May 20 14:30:08 NZST 2013</day>
          <frequency>3.0</frequency>
          <from>Wellington</from>
          <volumecost>2.0</volumecost>
     </RoutePriceUpdateEvent>
<CustomerPriceUpdateEvent>
          <weightcost>3.0</weightcost>
          <to>Wellington</to>
          <priority>air priority</priority>
          <from>Sydney</from>
          <volumecost>2.0</volumecost>
     </CustomerPriceUpdateEvent>
</BusinessEvents>

如您所见,第一个子节点根本没有缩进,但节点子节点缩进两次? 然后close标签只缩进一次?

我怀疑它可能与通过doc.appendChild(root)将根不添加到文档有关,但是当我这样做时,我得到一个错误

“试图在不允许的地方插入节点。”

这是我的解析器:

DocumentBuilderFactory icFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder icBuilder;
        try {
            icBuilder = icFactory.newDocumentBuilder();
            String businessEventsFile = System.getProperty("user.dir") + "/testdata/businessevents/businessevents.xml";
            Document doc = icBuilder.parse (businessEventsFile);

            Element root = doc.getDocumentElement();

            Element element;

            if(event instanceof CustomerPriceUpdateEvent){
                element = doc.createElement("CustomerPriceUpdateEvent");
            }
            else if(event instanceof DiscontinueEvent){
                element = doc.createElement("DiscontinueEvent");
            }
            else if(event instanceof MailEvent){
                element = doc.createElement("MailEvent");
            }
            else if(event instanceof RoutePriceUpdateEvent){
                element = doc.createElement("RoutePriceUpdateEvent");
            }
            else{
                throw new Exception("business event isnt valid");
            }

            for(Map.Entry<String, String> field : event.getFields().entrySet()){
                Element newElement = doc.createElement(field.getKey());
                newElement.appendChild(doc.createTextNode(field.getValue()));
                element.appendChild(newElement);
            }

            root.appendChild(element);


            // output DOM XML to console
            Transformer transformer = TransformerFactory.newInstance().newTransformer();
//            transformer.setOutputProperty(OutputKeys.METHOD, "xml");
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "5");
            DOMSource source = new DOMSource(doc);
            StreamResult console = new StreamResult(businessEventsFile);
            transformer.transform(source, console);

任何见解都将受到赞赏。

1 个答案:

答案 0 :(得分:7)

前一段时间我遇到了同样的问题。 我发现问题在于解析后的文档在文档中包含了空格作为文本节点。

例如,在解析文档之后,您可能在<MailEvent>节点下的<BusinessEvents>节点之前有一个空白文本节点。 Transformer保留空白文本节点(我认为这是正确的行为)。

因此,如果xml文本中的标记之间根本没有空格,则Transformer会正确缩进标记。 您可以通过手动删除输入中的所有空格(包括换行符)来尝试使用您的代码,然后执行格式化。那么输出可能会超出你的预期。

解决此问题的一种方法是在解析文档后从文档中删除多余的空格。 简单地删除所有空白文本节点将使格式看起来更好,但问题是如果实际需要一些空白文本节点。

所以我在格式化之前清理文档的目的是删除所有只包含空格的文本节点,之外的文本节点是唯一的子节点(没有兄弟节点)。

下面的方法cleanEmptyTextNodes(Node parentNode)以递归方式从子树中删除所有空白文本节点。

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.StringWriter;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.xml.sax.SAXException;

public class FormatXml {

    public static void main(String[] args) throws ParserConfigurationException,
            FileNotFoundException, SAXException, IOException,
            TransformerException {
        DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
                .newInstance();
        DocumentBuilder documentBuilder = docBuilderFactory
                .newDocumentBuilder();
        Document node = documentBuilder.parse(new FileInputStream("data.xml"));
        System.out.println(format(node, 4));
    }

    public static String format(Node node, int indent)
            throws TransformerException {
        cleanEmptyTextNodes(node);
        StreamResult result = new StreamResult(new StringWriter());
        getTransformer(indent).transform(new DOMSource(node), result);
        return result.getWriter().toString();
    }

    private static Transformer getTransformer(int indent) {
        Transformer transformer;
        try {
            transformer = TransformerFactory.newInstance().newTransformer();
        } catch (Exception e) {
            throw new RuntimeException("Failed to create the Transformer", e);
        }
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty(
                "{http://xml.apache.org/xslt}indent-amount",
                Integer.toString(indent));
        return transformer;
    }

    /**
     * Removes text nodes that only contains whitespace. The conditions for
     * removing text nodes, besides only containing whitespace, are: If the
     * parent node has at least one child of any of the following types, all
     * whitespace-only text-node children will be removed: - ELEMENT child -
     * CDATA child - COMMENT child
     * 
     * The purpose of this is to make the format() method (that use a
     * Transformer for formatting) more consistent regarding indenting and line
     * breaks.
     */
    private static void cleanEmptyTextNodes(Node parentNode) {
        boolean removeEmptyTextNodes = false;
        Node childNode = parentNode.getFirstChild();
        while (childNode != null) {
            removeEmptyTextNodes |= checkNodeTypes(childNode);
            childNode = childNode.getNextSibling();
        }

        if (removeEmptyTextNodes) {
            removeEmptyTextNodes(parentNode);
        }
    }

    private static void removeEmptyTextNodes(Node parentNode) {
        Node childNode = parentNode.getFirstChild();
        while (childNode != null) {
            // grab the "nextSibling" before the child node is removed
            Node nextChild = childNode.getNextSibling();

            short nodeType = childNode.getNodeType();
            if (nodeType == Node.TEXT_NODE) {
                boolean containsOnlyWhitespace = childNode.getNodeValue()
                        .trim().isEmpty();
                if (containsOnlyWhitespace) {
                    parentNode.removeChild(childNode);
                }
            }
            childNode = nextChild;
        }
    }

    private static boolean checkNodeTypes(Node childNode) {
        short nodeType = childNode.getNodeType();

        if (nodeType == Node.ELEMENT_NODE) {
            cleanEmptyTextNodes(childNode); // recurse into subtree
        }

        if (nodeType == Node.ELEMENT_NODE
                || nodeType == Node.CDATA_SECTION_NODE
                || nodeType == Node.COMMENT_NODE) {
            return true;
        } else {
            return false;
        }
    }

}

输入的结果格式化输出:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<BusinessEvents>
    <MailEvent>
        <to>Wellington</to>
        <weight>10.0</weight>
        <priority>air priority</priority>
        <volume>10.0</volume>
        <from>Christchurch</from>
        <day>Mon May 20 14:30:08 NZST 2013</day>
        <PPW>8.0</PPW>
        <PPV>2.5</PPV>
    </MailEvent>
    <DiscontinueEvent>
        <to>Wellington</to>
        <priority>air priority</priority>
        <company>Kiwi Co</company>
        <from>Sydney</from>
    </DiscontinueEvent>
    <RoutePriceUpdateEvent>
        <weightcost>3.0</weightcost>
        <to>Wellington</to>
        <duration>15.0</duration>
        <maxweight>40.0</maxweight>
        <maxvolume>20.0</maxvolume>
        <priority>air priority</priority>
        <company>Kiwi Co</company>
        <day>Mon May 20 14:30:08 NZST 2013</day>
        <frequency>3.0</frequency>
        <from>Wellington</from>
        <volumecost>2.0</volumecost>
    </RoutePriceUpdateEvent>
    <CustomerPriceUpdateEvent>
        <weightcost>3.0</weightcost>
        <to>Wellington</to>
        <priority>air priority</priority>
        <from>Sydney</from>
        <volumecost>2.0</volumecost>
    </CustomerPriceUpdateEvent>
</BusinessEvents>