使用SAX + Java的奇怪的字符()行为

时间:2013-10-08 00:38:45

标签: java xml parsing character sax

在我的XML中,我有一个多行元素:

<tag id="sometag" ...>
    | first line
    |     second line
    |         third line
    |     fourth line
<tag ...>
....
<tag id="someothertag" ...>
    | ANOTHER FIRST LINE
    |     ANOTHER SECOND LINE
    |         ANOTHER THIRD LINE
    |     ANOTHER FORTH LINE
<tag ...>

然后在Java中我有必要的startElementendElementcharacters方法,但我发现characters有一些奇怪的行为:

public void characters(char[] ch, int start, int length){
    Log.d(TAG, "characters( "\"" + (new String(ch)).replaceAll("[\r\n]", "\\n") + "\", " + start + ", " + length + " )");
}

除此之外,我对角色一无所知。我基本上创建了两个解析器实例。有一个实例,我正在搜索sometag。如果我找到了我正在寻找的东西并返回该元素,我会抛出异常。

D/MyProgram( 1565): STARTING document parsing...
D/MyProgram( 1565): characters( "n   ", 0, 1 )
D/MyProgram( 1565): characters( "        | first line", 0, 20 )
D/MyProgram( 1565): characters( "n       | first line", 0, 1 )
D/MyProgram( 1565): characters( "        |   second line", 0, 23 )
D/MyProgram( 1565): characters( "n       |   second line", 0, 1 )
D/MyProgram( 1565): characters( "        |       third line", 0, 26 )
D/MyProgram( 1565): characters( "n       |       third line", 0, 1 )
D/MyProgram( 1565): characters( "        |   fourth lineline", 0, 22 )
D/MyProgram( 1565): characters( "n       |   fourth lineline", 0, 1 )
D/MyProgram( 1565): characters( "        |   fourth lineline", 0, 4 )
D/MyProgram( 1565): Successfully found "sometag"!

...和我正在搜索someothertag的另一个全新实例。我做的和以前一样。

D/MyProgram( 1565): STARTING document parsing...
D/MyProgram( 1565): characters( "n", 0, 1 )
D/MyProgram( 1565): characters( "    ", 0, 4 )
D/MyProgram( 1565): characters( "n   ", 0, 1 )
D/MyProgram( 1565): characters( "        | first line", 0, 20 )
D/MyProgram( 1565): characters( "n       | first line", 0, 1 )
D/MyProgram( 1565): characters( "        |   second line", 0, 23 )
D/MyProgram( 1565): characters( "n       |   second line", 0, 1 )
D/MyProgram( 1565): characters( "        |       third line", 0, 26 )
D/MyProgram( 1565): characters( "n       |       third line", 0, 1 )
D/MyProgram( 1565): characters( "        |   fourth lineline", 0, 22 )
D/MyProgram( 1565): characters( "n       |   fourth lineline", 0, 1 )
D/MyProgram( 1565): characters( "        |   fourth lineline", 0, 4 )
D/MyProgram( 1565): Successfully found "someothertag"!

我理解XML解析是基于流的(它解析块而不是整个字符串),但这是非常奇怪的行为。以下是我注意到的一些令人困惑的事情:

  • 每次迭代的字符(),解析器都不会从它停止的地方开始或者完成字符,如果它确实完成解析:我甚至在之前获得之前的字符第一个char数组('n',它取代了换行符)。
  • ch包含最初不存在的额外字符:“line”附加到“forth line”。
  • 当我创建一个全新的解析器实例时,字符将被“重新读取”​​。第二次执行应该是这样的:

..这...

D/MyProgram( 1565): characters( "n", 0, 1 )
D/MyProgram( 1565): characters( "    ", 0, 4 )
D/MyProgram( 1565): characters( "n   ", 0, 1 )
D/MyProgram( 1565): characters( "        | ANOTHER FIRST LINE", 0, 20 )
D/MyProgram( 1565): characters( "n       |     ANOTHER SECOND LINE", 0, 1 )

......等等。

知道我做错了什么吗?提前谢谢。

1 个答案:

答案 0 :(得分:3)

正如Margulies所说,你没有在传递的字符数组中使用startlength

public void characters(char[] ch, int start, int length) {
    // use only the indicated segment.
    String str = new String( ch, start, length);  
    Log.d(TAG, "characters( "\"" + str.replaceAll("[\r\n]", "\\n") + "\", " + start + ", " + length + " )");
}