使用pycuda.driver.Event测量时间会产生错误的结果

时间:2012-09-04 07:52:28

标签: python time pycuda

我从PyCuda示例中运行SimpleSpeedTest.py,产生以下输出:

Using nbr_values == 8192
Calculating 100000 iterations
SourceModule time and first three results:
0.058294s, [ 0.005477  0.005477  0.005477]
Elementwise time and first three results:
0.102527s, [ 0.005477  0.005477  0.005477]
Elementwise Python looping time and first three results:
2.398071s, [ 0.005477  0.005477  0.005477]
GPUArray time and first three results:
8.207257s, [ 0.005477  0.005477  0.005477]
CPU time measured using :
0.000002s, [ 0.005477  0.005477  0.005477]

前四次测量是合理的,但最后一次测量(0.000002s)却是合理的。 CPU结果应该是最慢的,但它比最快的GPU方法快几个数量级。显然,测量的时间一定是错误的。这很奇怪,因为相同的计时方法似乎对前四个结果都很好。

所以我从SimpleSpeedTest.py中获取了一些代码并制作了一个小的测试文件 [2],它产生了:

time measured using option 1:
0.000002s 
time measured using option 2:
5.989620s 

选项1 使用pycuda.driver.Event.record()衡量持续时间(如在SimpleSpeedTest.py中),选项2 使用time.clock()。同样,选项1关闭,而选项2提供合理的结果(运行测试文件所需的时间大约为6秒)。

有没有人知道为什么会这样?

由于在SimpleSpeedTest.py中使用了选项1,可能是我的设置导致了问题吗?我正在运行GTX 470,显示驱动程序301.42,CUDA 4.2,Python 2.7 64,PyCuda 2012.1,X5650 Xeon

[2] 测试文件:

import numpy
import time
import pycuda.driver as drv
import pycuda.autoinit

n_iter = 100000
nbr_values = 8192 # = 64 * 128 (values as used in SimpleSpeedTest.py)

start = drv.Event() # option 1 uses pycuda.driver.Event
end = drv.Event()

a = numpy.ones(nbr_values).astype(numpy.float32) # test data

start.record() # start option 1 (inserting recording points into GPU stream)
tic = time.clock() # start option 2 (using CPU time)

for i in range(n_iter):
    a = numpy.sin(a) # do some work

end.record() # end option 1
toc = time.clock() # end option 2

end.synchronize() 

events_secs = start.time_till(end)*1e-3
time_secs = toc - tic 

print "time measured using option 1:"
print "%fs " % events_secs
print "time measured using option 2:"
print "%fs " % time_secs

1 个答案:

答案 0 :(得分:-1)

我联系了Andreas Klöckner,他建议同步启动事件。

...
start.record()
start.synchronize()
...

这似乎解决了这个问题!

time measured using option 1:
5.944461s
time measured using option 2:
5.944314s 

显然,CUDA的行为在过去两年中发生了变化。我更新了SimpleSpeedTest.py