Question

我正在尝试让CUDA项目尽可能地接近OO设计。目前，我发现的解决方案是使用Struct来封装数据，对于每个需要GPU处理的方法，需要实现3个函数：

对象将调用的方法。
__ 全局 __ 函数，该函数将调用该结构的 __ 设备 __ 方法。
结构中的 __ 设备 __ 方法。

我会举个例子。假设我需要实现一个方法来初始化struct中的缓冲区。它看起来像是这样的：

struct Foo
{
   float *buffer;
   short2 buffer_resolution_;
   short2 block_size_;
   __device__ initBuffer()
   {
      int x = blockIdx.x * blockDim.x + threadIdx.x;
      int y = blockIdx.y * blockDim.y + threadIdx.y;
      int plain_index = (y * buffer_resolution.x) + x;
      if(plain_index < buffer_size)
         buffer[plain_index] = 0;
   }
   void init(const short2 &buffer_resolution, const short2 &block_size)
   {
       buffer_resolution_ = buffer_resolution;
       block_size_ = block_size;
       //EDIT1 - Added the cudaMalloc
       cudaMalloc((void **)&buffer_, buffer_resolution.x * buffer_resolution.y);
       dim3 threadsPerBlock(block_size.x, block_size.y);
       dim3 blocksPerGrid(buffer_resolution.x/threadsPerBlock.x, buffer_resolution.y/threadsPerBlock.y)
       initFooKernel<<<blocksPerGrid, threadsPerBlock>>>(this);
   }
}

__global__ initFooKernel(Foo *foo)
{
   foo->initBuffer();
}

我需要这样做，因为看起来我不能在结构中声明 __ 全局 __ 。我通过查看一些开源项目已经学会了这种方法，但实现三个函数来实现每个封装的GPU方法看起来很麻烦。所以，我的问题是：这是最好的/唯一的方法吗？这甚至是一种有效的方法吗？

EDIT1：我忘了在调用initFooKernel之前让cudaMalloc分配缓冲区。修正了它。

Answer 1

目标是使用CUDA的类看起来像是来自外部的普通类吗？

如果是这样，为了扩展O'Conbhui所说的内容，您可以为CUDA功能创建C样式调用，然后创建一个包装这些调用的类。

因此，在.cu文件中，您将为纹理引用，内核，调用内核的C样式函数和分配和释放GPU内存的C样式函数添加定义。在您的示例中，这将包括一个调用初始化GPU内存的内核的函数。

然后，在相应的.cpp文件中，导入一个包含.cu文件中函数声明的标题，然后定义您的类。在构造函数中，调用.cu函数，该函数分配CUDA内存并设置其他CUDA资源，例如纹理，包括您自己的内存初始化函数。在析构函数中，您可以调用释放CUDA资源的函数。在您的成员函数中，您可以调用调用内核的函数。

封装CUDA内核的最佳方法是什么？

1 个答案: