如何计时NVIDIA SDK示例?

时间:2013-07-25 12:57:39

标签: sdk opencl nvidia

我尝试计时oclVectorAdd示例。我使用clGetProfilingInfo,GPU计时器来记录内核执行所花费的时间。时间以毫秒为单位计算。但输出很奇怪。 代码和输出如下:

    cl_ulong start,end;
cl_event event_ker_x;
ciErr1 = clEnqueueNDRangeKernel(cqCommandQueue, ckKernel, 1, NULL, &szGlobalWorkSize, &szLocalWorkSize, 0, NULL, &event_ker_x);
shrLog("clEnqueueNDRangeKernel (VectorAdd)...\n");
if (ciErr1 != CL_SUCCESS)
{
    shrLog("Error in clEnqueueNDRangeKernel, Line %u in file %s !!!\n\n", __LINE__, __FILE__);
    Cleanup(argc, argv, EXIT_FAILURE);
}
clGetEventProfilingInfo(event_ker_x, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &start, NULL);
clGetEventProfilingInfo(event_ker_x, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &end, NULL);
float ker_x_time= (end-start) * 1.0e-6f;
shrLog("kernel execution time is : %f\n", ker_x_time);
clEnqueueNDRangeKernel (VectorAdd)...
kernel execution time is : 18446744027136.000000
clEnqueueReadBuffer (Dst)...

1 个答案:

答案 0 :(得分:0)

您似乎遇到与此人类似的问题:Timed interval always evaluates to zero

在OpenCL中,clEnqueueNDRangeKernel将内核排队运行,但不一定要立即执行内核。要使用事件来分析内核,请尝试在clEnqueueReadBuffer之后检查执行时间,或在clFinish(..)之后添加clEnqueueNDKernelRange