Question

我正在尝试使用java中的charset GB2312 解码char ·

GB2312 中包含此字符，位置代码为a1a4 check here

码

public static void main(String[] _args) throws Exception {
    String str="a1a4:· a5f6:ヶ a8c5:ㄅ";          
    ByteBuffer bf=readToByteBuffer(new ByteArrayInputStream(str.getBytes()));
    System.out.println(Charset.forName("GB2312").decode(bf).toString());
}
private static final int bufferSize = 0x20000;
static ByteBuffer readToByteBuffer(InputStream inStream) throws IOException {
    byte[] buffer = new byte[bufferSize];
    ByteArrayOutputStream outStream = new ByteArrayOutputStream(bufferSize);
    int read;
    while (true) {
        read = inStream.read(buffer);
        if (read == -1)
            break;
        outStream.write(buffer, 0, read);
    }
    ByteBuffer byteData = ByteBuffer.wrap(outStream.toByteArray());
    return byteData;
}

上面的代码输出结果为：

a1a4:? a5f6:ヶ a8c5:ㄅ

我不明白为什么无法解码a1a4？

Answer 1

在我的浏览器中，您的字符串d的第五个字符编码为0xB7，即MIDDLE DOT，而不是KATAKANA MIDDLE DOT。但是，根据您提到的相同数据库，该代码点is not available in the GB2312 character set。同样，you can see MIDDLE DOT和0xB7的编码都不会列为 GB2312 的一部分。

我认为这里的问题是输入字符串中的字符，而不是JRE提供的CharsetDecoder中的字符。

java charset解码问题

1 个答案: