比较直接和非直接ByteBuffer get / put操作

时间:2012-06-24 01:00:29

标签: java memory nio bytebuffer

从非直接字节缓冲区中获取/放置比直接bytebuffer中的get / put更快吗?


2 个答案:

答案 0 :(得分:23)


从非直接字节缓冲区中获取/放置比直接bytebuffer中的get / put更快吗?

如果要将堆缓冲区与不使用本机字节顺序的直接缓冲区进行比较(大多数系统都是小端,而直接ByteBuffer的默认值是big endian),则性能非常相似。


在HotSpot / OpenJDK中,ByteBuffer使用Unsafe类,许多native方法被视为intrinsics。这是依赖于JVM的,而AFAIK是Android VM在最近版本中将其视为内在的。


事实上,如果您进行微调,您可能会发现ByteBuffer getXxxx或setXxxx的大部分时间都花在边界检查上,而不是实际的内存访问。出于这个原因,当我必须以最大限度地提高性能时,我仍然直接使用(注意:Oracle不鼓励这样做)



我不愿意看到什么比这更好。 ;)听起来很复杂。



public static void main(String... args) {
    ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    for (int i = 0; i < 10; i++)
        runTest(bb1, bb2);

private static void runTest(ByteBuffer bb1, ByteBuffer bb2) {
    long start = System.nanoTime();
    int count = 0;
    while (bb2.remaining() > 0)
    long time = System.nanoTime() - start;
    int operations = bb1.capacity() / 4 * 2;
    System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations);


Each putInt/getInt took an average of 83.9 ns
Each putInt/getInt took an average of 1.4 ns
Each putInt/getInt took an average of 34.7 ns
Each putInt/getInt took an average of 1.3 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.3 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns

我非常确定JNI调用的时间超过1.2 ns。


public static void main(String... args) {
    ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    for (int i = 0; i < 10; i++)
        runTest(bb1, bb2);

private static void runTest(ByteBuffer bb1, ByteBuffer bb2) {
    Unsafe unsafe = getTheUnsafe();
    long start = System.nanoTime();
    long addr1 = ((DirectBuffer) bb1).address();
    long addr2 = ((DirectBuffer) bb2).address();
    for (int i = 0, len = Math.min(bb1.capacity(), bb2.capacity()); i < len; i += 4)
        unsafe.putInt(addr1 + i, unsafe.getInt(addr2 + i));
    long time = System.nanoTime() - start;
    int operations = bb1.capacity() / 4 * 2;
    System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations);

public static Unsafe getTheUnsafe() {
    try {
        Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
        return (Unsafe) theUnsafe.get(null);
    } catch (Exception e) {
        throw new AssertionError(e);


Each putInt/getInt took an average of 40.4 ns
Each putInt/getInt took an average of 44.4 ns
Each putInt/getInt took an average of 0.4 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns

因此,您可以看到native调用比JNI调用所期望的要快得多。这种延迟的主要原因可能是L2缓存速度。 ;)

全部在i3 3.3 GHz上运行

答案 1 :(得分:2)



  1. 如果您没有在Java土地上玩数据,例如只是将一个通道复制到另一个通道,直接缓冲区更快,因为数据根本不必越过JNI边界。

  2. 相反,如果您正在使用Java land中的数据,非直接缓冲区将更快。它的重要性取决于数据跨越JNI边界的数量以及每次传输的量子数量。例如,从/向直接缓冲区一次获取或放入一个字节可能会非常昂贵,一次获取/放置16384个字节会大大减少JNI边界成本。

  3. 要回答你的第二段,我会使用一个本地byte []数组,而不是一个线程本地的,但是如果我在Java中使用数据,我根本不会使用直接的字节缓冲区。正如Javadoc所说,直接字节缓冲区应仅用于可提供可衡量的性能优势的地方。