在Java中运行CUDA代码的最简单方法是什么?

时间:2012-10-11 15:58:48

标签: java cuda jcuda

我有一些用C语言编写的CUDA代码,它似乎运行正常(它的普通旧C而不是C ++)。我正在运行一个Hadoop集群,并希望整合我的代码,所以我希望在Java中运行它(长话短说:系统太复杂了。)

目前,C程序解析一个日志文件然后需要几千行,在GPU上并行处理每一行,然后将特定错误/事务保存到链表中,然后将它们写入驱动器。

这样做的最佳方法是什么? JCUDA是C Cuda的完美映射还是完全不同?或者从Java调用C代码并共享结果(链接列表是否可访问)是否有意义?

1 个答案:

答案 0 :(得分:1)

IMO? JavaCPP。例如,以下是the main page of Thrust's Web site上显示的示例的Java端口:

import com.googlecode.javacpp.*;
import com.googlecode.javacpp.annotation.*;

@Platform(include={"<thrust/host_vector.h>", "<thrust/device_vector.h>", "<thrust/generate.h>", "<thrust/sort.h>",
                   "<thrust/copy.h>", "<thrust/reduce.h>", "<thrust/functional.h>", "<algorithm>", "<cstdlib>"})
@Namespace("thrust")
public class ThrustTest {
    static { Loader.load(); }

    public static class IntGenerator extends FunctionPointer {
        static { Loader.load(); }
        protected IntGenerator() { allocate(); }
        private native void allocate();
        public native int call();
    }

    @Name("plus<int>")
    public static class IntPlus extends Pointer {
        static { Loader.load(); }
        public IntPlus() { allocate(); }
        private native void allocate();
        public native @Name("operator()") int call(int x, int y);
    }

    @Name("host_vector<int>")
    public static class IntHostVector extends Pointer {
        static { Loader.load(); }
        public IntHostVector() { allocate(0); }
        public IntHostVector(long n) { allocate(n); }
        public IntHostVector(IntDeviceVector v) { allocate(v); }
        private native void allocate(long n);
        private native void allocate(@ByRef IntDeviceVector v);

        public IntPointer begin() { return data(); }
        public IntPointer end() { return data().position((int)size()); }

        public native IntPointer data();
        public native long size();
        public native void resize(long n);
    }

    @Name("device_ptr<int>")
    public static class IntDevicePointer extends Pointer {
        static { Loader.load(); }
        public IntDevicePointer() { allocate(null); }
        public IntDevicePointer(IntPointer ptr) { allocate(ptr); }
        private native void allocate(IntPointer ptr);

        public native IntPointer get();
    }

    @Name("device_vector<int>")
    public static class IntDeviceVector extends Pointer {
        static { Loader.load(); }
        public IntDeviceVector() { allocate(0); }
        public IntDeviceVector(long n) { allocate(n); }
        public IntDeviceVector(IntHostVector v) { allocate(v); }
        private native void allocate(long n);
        private native void allocate(@ByRef IntHostVector v);

        public IntDevicePointer begin() { return data(); }
        public IntDevicePointer end() { return new IntDevicePointer(data().get().position((int)size())); }

        public native @ByVal IntDevicePointer data();
        public native long size();
        public native void resize(long n);
    }

    public static native @MemberGetter @Namespace IntGenerator rand();
    public static native void copy(@ByVal IntDevicePointer first, @ByVal IntDevicePointer last, IntPointer result);
    public static native void generate(IntPointer first, IntPointer last, IntGenerator gen);
    public static native void sort(@ByVal IntDevicePointer first, @ByVal IntDevicePointer last);
    public static native int reduce(@ByVal IntDevicePointer first, @ByVal IntDevicePointer last, int init, @ByVal IntPlus binary_op);

    public static void main(String[] args) {
        // generate 32M random numbers serially
        IntHostVector h_vec = new IntHostVector(32 << 20);
        generate(h_vec.begin(), h_vec.end(), rand());

        // transfer data to the device
        IntDeviceVector d_vec = new IntDeviceVector(h_vec);

        // sort data on the device (846M keys per second on GeForce GTX 480)
        sort(d_vec.begin(), d_vec.end());

        // transfer data back to host
        copy(d_vec.begin(), d_vec.end(), h_vec.begin());

        // compute sum on device
        int x = reduce(d_vec.begin(), d_vec.end(), 0, new IntPlus());
    }
}

您在C中的代码应该更容易映射。

我们可以使用这些命令在Linux x86_64上编译和运行,或者通过适当修改-properties选项在其他支持的平台上运行:

$ javac -cp javacpp.jar ThrustTest.java
$ java -jar javacpp.jar ThrustTest -properties linux-x86_64-cuda
$ java  -cp javacpp.jar ThrustTest
相关问题