OpenCL基准 - 关于参数变化的建议

时间:2016-05-09 03:01:39

标签: opencl benchmarking reduction

我想在radeon HD 7970 Tahiti XT上使用OpenCL(来自此AMD link)执行关于两阶段减少的运行时基准测试。


  int global_index = get_global_id(0);
  float accumulator = 0;
  // Loop sequentially over chunks of input vector
  while (global_index < length) {
    float element = buffer[global_index];
    accumulator += element;
    global_index += get_global_size(0);


enter image description here


来自this link,有人说AMD推荐工作组大小为64的倍数(NVIDIA为32)。

此外,从this other link的上一条评论开始,建议将工作组大小设置为:WorkGroup size = (Number of total threads) / (Compute Units)。 在我的GPU卡上,我有32个计算单元。

所以我想知道哪些参数会有所变化,以便比较第二个版本中的运行时(第一个减少循环)。例如,我可能会为比率(N size of input array) / (total NworkItems)WorkGroup size的固定值采用不同的值(请参阅上面的表达式),

或者相反,即我应该改变WorkGroup size的值并确定比率(N size of input array) / (total NworkItems)


1 个答案:

答案 0 :(得分:2)


  int chunk_size = length/get_global_size(0)+(length%get_global_size(0) > 0); //Will give how many items each work item needs to process
  int global_index = get_group_id(0)*get_local_size(0)*chunk_size + get_local_id(0); //Start at this address for this work item
  float accumulator = 0;

  for(int i=0; i<chunk_size; i++)
    // Loop sequentially over chunks of input vector
    if (global_index < length) {
      float element = buffer[global_index];
      accumulator += element;
      global_index += get_local_size(0);
