Question

byte[] byteArray = Charset.forName("UTF-8").encode("hello world").array();
System.out.println(byteArray.length);

为什么上面的代码行打印出12，不应该打印11而不是？

Answer 1

数组的长度是ByteBuffer容量的大小，它是由您编码的字符数生成的，但不等于。我们来看看我们如何为ByteBuffer ...

分配内存

如果您深入研究encode()方法，您会发现CharsetEncoder#encode(CharBuffer)看起来像这样：

public final ByteBuffer encode(CharBuffer in)
    throws CharacterCodingException
{
    int n = (int)(in.remaining() * averageBytesPerChar());
    ByteBuffer out = ByteBuffer.allocate(n);
    ...

根据我的调试器，averageBytesPerChar的{{1}}为UTF_8$Encoder，输入1.1有String个字符。 11，代码在进行计算时将总计投放到11 * 1.1 = 12.1，因此int的结果大小为12。

Answer 2

因为它返回ByteBuffer。这是缓冲区的容量（实际上甚至不是因为可能的切片），而不是使用了多少字节。这有点像malloc(10)可以自由返回32个字节的内存。

System.out.println(Charset.forName("UTF-8").encode("hello world").limit());

那是11（正如预期的那样）。

Answer 3

import java.nio.charset.*;
public class ByteArrayTest {
    public static void main(String[] args) {
        String theString = "hello world";
        System.out.println(theString.length());
        byte[] byteArray = Charset.forName("UTF-8").encode(theString).array();
        System.out.println(byteArray.length);
        for (int i = 0; i < byteArray.length; i++) {
            System.out.println("Byte " + i + " = " + byteArray[i]);
        }
    }
}

结果：

C:\JavaTools>java ByteArrayTest
11
12
Byte 0 = 104
Byte 1 = 101
Byte 2 = 108
Byte 3 = 108
Byte 4 = 111
Byte 5 = 32
Byte 6 = 119
Byte 7 = 111
Byte 8 = 114
Byte 9 = 108
Byte 10 = 100
Byte 11 = 0

数组以空值终止，就像任何好的C字符串一样。

（但显然真正的原因是片状方法array。它可能不应该用在“生产”代码中，除非非常谨慎。）

ByteBuffer中编码的字符串的长度是多少

3 个答案: