getCharacterOffset()返回不正确的值

时间:2012-09-29 21:53:33

标签: java stax

我正在使用StAX来解析XML文件,并想知道每个标记的开始和结束位置。为此,我尝试使用getLocation().getCharacterOffset(),但它会为每个标记返回不正确的值。

XMLInputFactory factory = XMLInputFactory.newInstance();
XMLEventReader reader = factory.createXMLEventReader(
        new StringReader("<root>txt1<tag>txt2</tag></root>"));

XMLEvent e;
e = reader.nextEvent(); // START_DOCUMENT
System.out.println(e);
System.out.println(e.getLocation());
e = reader.nextEvent(); // START_ELEMENT "root"
System.out.println(e);
System.out.println(e.getLocation());
e = reader.nextEvent(); // CHARACTERS "txt1"
System.out.println(e);
System.out.println(e.getLocation());
e = reader.nextEvent(); // START_ELEMENT "tag"
System.out.println(e);
System.out.println(e.getLocation());

上面的代码打印出来:

<?xml version="null" encoding='null' standalone='no'?>
Line number = 1
Column number = 1
System Id = null
Public Id = null
Location Uri= null
CharacterOffset = 0

<root>
Line number = 1
Column number = 7
System Id = null
Public Id = null
Location Uri= null
CharacterOffset = 6

txt1
Line number = 1
Column number = 12
System Id = null
Public Id = null
Location Uri= null
CharacterOffset = 11

<tag>
Line number = 1
Column number = 16
System Id = null
Public Id = null
Location Uri= null
CharacterOffset = 15

<root>CharacterOffset正确6,但txt1 11之后10正好{{1}}。它返回的确切偏移量是什么?

1 个答案:

答案 0 :(得分:2)

这可能是Sun / Oracle的StAX实现的错误/特性。 使用Woodstox,您得到0, 0, 6, 10,这似乎是正确的。 从http://wiki.fasterxml.com/WoodstoxHome和。下载Woodstox 将JAR(woodstox-core + stax2-api)添加到类路径中。然后, XMLInputFactory会自动选择Woodstox实施。

相关问题