OpenCL的Halide AOT可以很好地用作静态库,但不能作为共享对象

时间:2017-03-18 10:24:03

标签: shared-libraries opencl gpu static-libraries halide

我尝试将以下代码编译为静态库和目标文件:

Halide::Func f("f");
Halide::Var x("x");

f(x) = x;
f.gpu_tile(x, 4);
f.bound(x, 0, 16);

Halide::Target target = Halide::get_target_from_environment();
target.set_feature(Halide::Target::OpenCL);
target.set_feature(Halide::Target::Debug);
// f.compile_to_static_library("mylib", {}, "f", target);
// f.compile_to_file("mylib", {}, "f", target);

如果是静态链接,所有工作都很正常,输出结果是正确的:

Halide::Buffer<int> output(16);
f(output.raw_buffer());
output.copy_to_host();
std::cout << output(10) << std::endl;

但是当我尝试将对象文件链接到共享对象时,

gcc -shared -pthread mylib.o -o mylib.so

从代码(Ubuntu 16.04)打开它,

void* handle = dlopen("mylib.so", RTLD_NOW);
int (*func)(halide_buffer_t*);
*(void**)(&func) = dlsym(handle, "f");
func(output.raw_buffer());

我收到CL_INVALID_MEM_OBJECT错误。这是调试日志:

CL: halide_opencl_init_kernels (user_context: 0x0, state_ptr: 0x7f1266b5a4e0, program: 0x7f1266957480, size: 1577
    load_libopencl (user_context: 0x0)
    Loaded OpenCL runtime library: libOpenCL.so
    create_opencl_context (user_context: 0x0)
    Got platform 'Intel(R) OpenCL', about to create context (t=6249430)
    Multiple CL devices detected. Selecting the one with the most cores.
      Device 0 has 20 cores
      Device 1 has 4 cores
    Selected device 0
      device name: Intel(R) HD Graphics
      device vendor: Intel(R) Corporation
      device profile: FULL_PROFILE
      global mem size: 1630 MB
      max mem alloc size: 815 MB
      local mem size: 65536
      max compute units: 20
      max workgroup size: 256
      max work item dimensions: 3
      max work item sizes: 256x256x256x0
    clCreateContext -> 0x1899af0
    clCreateCommandQueue 0x1a26a80
    clCreateProgramWithSource -> 0x1a26ab0
    clBuildProgram 0x1a26ab0 -D MAX_CONSTANT_BUFFER_SIZE=854799155 -D MAX_CONSTANT_ARGS=8
    Time: 1.015832e+02 ms
CL: halide_opencl_run (user_context: 0x0, entry: kernel_f_s0_x___deprecated_block_id_x___block_id_x, blocks: 4x1x1, threads: 4x1x1, shmem: 0
    clCreateKernel kernel_f_s0_x___deprecated_block_id_x___block_id_x ->     Time: 1.361700e-02 ms
    clSetKernelArg 0 4 [0x2e00010000000000 ...] 0
    clSetKernelArg 1 8 [0x2149040 ...] 1
Mapped dev handle is: 0x2149040
Error: CL: clSetKernelArg failed: CL_INVALID_MEM_OBJECT
Aborted (core dumped)

非常感谢您的帮助!提交状态c7375fa。如果有必要,我很乐意提供额外的信息。

1 个答案:

答案 0 :(得分:0)

解决方案:在这种情况下,我们有运行时重复。使用标记RTLD_DEEPBIND加载共享对象。

void* handle = dlopen("mylib.so", RTLD_NOW | RTLD_DEEPBIND);
  

RTLD_DEEPBIND(自glibc 2.3.4起)   将符号的查找范围放在此库的全局范围之前。这意味着一个独立的库将使用自己的符号而不是全局符号,这些符号包含在已经加载的库中。 POSIX.1-2001中未指定此标志。   https://linux.die.net/man/3/dlopen

相关问题