preserve /t and /n in XML attribute with Java parser

时间:2018-06-04 16:39:24

标签: java xml xml-parsing

In a XML file parsed to a Document I want to get a XML attribute that has embedded tabs and new lines.

I've googled and found that the XML parsing spec says the attribute text is "normalized", replacing white space characters with a blank.

I guess a have to replace the tabs and line breaks with an appropriate escaped character before I parse the XML.

In all of my googling I have not found a straightforward method to get from the File to a Document where the attribute text is returned with Tabs and Line breaks preserved.

The XML file is generated from a third party application so it may not be addressed there.

I want to use the JDK parser.

My initial attempts at reading the File into a string and parsing the String fail with a parse error on the first byte

Any suggestions on a straight forward approach?

An example element is at pastbin Element example

[1]: https://pastebin.com/pc9uGbSD

I perform a XML Parse like this

public ReadPlexExport(Path xmlPath, ExportType exType) throws Exception {
    this.xmlPath = xmlPath;
    this.type = exType;
    this.doc = DBF.newDocumentBuilder().parse(this.xmlPath.toFile());
}

1 个答案:

答案 0 :(得分:0)

我当前问题的快速而肮脏的解决方案是逐行读取XML文件作为文本文件,在每行上用转义的选项卡值替换\ t字符,将行写入新文件,然后附加一个逃跑换线。

可以解析新的XML文件。原始XML将始终采用允许此hack为\ t的形式,并且只有在属性中才会出现换行符。