加快python代码

时间:2014-06-25 08:46:39

标签: python matlab numpy

编辑2:基于评论的其他信息: 该函数应该计算图像的径向积分(即在极坐标中),然后通过半径归一化。即,在Mathematica表示法中:rotmean =积分[f [r] r,{r,0,rmax}] /积分[r,{r,0,rmax}]。实际上,当我将图像从二维像素图展平为一维径向平均值时,我想进行双线性插值。如果我不进行插值,并且只使用最近的邻居,那么接近零半径的值可以完全关闭。

所以算法可以描述如下:

  • 计算半径网格

  • 查找半径的地板作为矩阵索引

  • 从地板上找到像素的双线性插值的余数

现在,对于底部给出的矢量化解决方案,我做了一些预计算:

  • 找到由双线性插值加权的图像(0和+1像素分别为mage_p和mage_n。这给出了径向和(而不是平均值)积分[f [r] r,{r,0, RMAX}]

  • 在概念上(对我而言)对问题进行矢量化的艰难步骤是将N ** 2像素图像折叠成N * sqrt(2)径向和。这是通过诸如“rmean [rfloor] = mage_p”之类的行完成的,注意到rfloor的长度均为N ** 2而rmean的长度为N * sqrt(2)

  • 同时计算权重,给出积分[r,{r,0,rmax}]

  • 从径向总和转换为平均值。

原始邮寄: 我正在尝试将一个函数从Matlab移植到Python(基于WinPython x64 2.7.6.4),我正在努力解决它的缓慢问题。该代码旨在获取围绕中心的图像的旋转平均值。通常,应用程序将找到图像的功率谱的旋转平均值,但是它也可以用于图像空间。这是代码:

def rotmean( mage ):
# This is terrifyingly slow
t0 = time.time()

N = int( np.floor( mage.shape[0]/2.0 ) )
M = int( np.floor( mage.shape[1]/2.0 ) )

rmax = np.ceil( np.sqrt( N**2 + M**2 ) ) + 1

rmean = np.zeros( [rmax] )
weights = np.zeros( [rmax] )

[xmesh, ymesh] = np.meshgrid( range(-N, N), range(-M, M) )
rmesh = np.sqrt( xmesh**2 + ymesh**2 )
rfloor = np.floor( rmesh )

remain = rmesh - rfloor
# Make rfloor into an index look-up table
rfloor = rfloor.astype(np.int)

t1 = time.time()
# It takes 300 ms just to get to here in the function.  
print "Init time = " + str(t1-t0)
print "Max rfloor = " + str( rfloor.max() )
print "rmean.shape = " + str( rmean.shape )


# Certainly mage*(1-remain) can be precalculated as well if we want to use more memory

for I in rfloor:
    # Sum pixels
    rmean[rfloor[I]] += mage[I]*(1-remain[I])
    rmean[rfloor[I]+1] += mage[I]*remain[I]

    # Calculate the total area of each pixel so we can compute the average from the sum
    weights[rfloor[I]] += (1-remain[I])
    weights[rfloor[I]+1] += remain[I]

t4 = time.time()
print "Total loop time = " + str(t4 - t1)

rmean /= weights # compute average from sum
raxis = range(0,rmean.size)
return [rmean, raxis]

很难超越函数与Matlab相比的速度。在我的笔记本电脑上执行需要大约450毫秒才能在我的笔记本电脑上执行,而需要300毫秒才能进入for循环,并且在我更强大的桌面上执行2k x 2k图像需要大约180秒。我意识到Matlab使用JIT编译for循环,但我尝试将代码编译成Cython以加速for循环,它实际上比解释代码慢。我怀疑我不了解Python如何分配内存等,但我找不到任何有关分析的注意事项。据我所知,有限的文件,我在这里使用+ =运算符进行在线(就地)操作?

通过numpy进行矢量化似乎是一个明显的解决方案,但我无法看到如何对操作进行矢量化。

编辑:好的,我对代码进行了矢量化,现在它在吐痰距离内,在Python中大约0.5秒(尽管台式计算机具有更强大的CPU,Xeon与i5相比)。我尝试了很多方法来提高for循环的速度,但我能做的最好的是超过30秒。

t0 = time.time()

N = int( np.floor( mage.shape[0]/2.0 ) )
M = int( np.floor( mage.shape[1]/2.0 ) )

rmax = np.ceil( np.sqrt( N**2 + M**2 ) ) + 1

[xmesh, ymesh] = np.meshgrid( range(-N, N), range(-M, M) )
rmesh = np.sqrt( xmesh**2 + ymesh**2 )
rfloor = np.floor( rmesh )

remain = rmesh - rfloor
# Make rfloor into an index look-up table
rfloor = rfloor.astype(np.int)

# I can flatten remain and mage
mage = mage.ravel()
remain = remain.ravel()
# remain_n = np.ones( remain.shape ) - remain;
remain_n = 1.0 - remain;
rfloor = rfloor.ravel()
mage_p = mage*remain
mage_n = mage*remain_n

# Somewhat better initialization time (~200 ms) but still slow...
t2 = time.time()
print "Initialize time = " + str(t2-t0)

rmean = np.zeros( [rmax] )
rmean_n = np.zeros( [rmax] )
weights = np.zeros( [rmax] )
weights_n = np.zeros( [rmax] )

# Find positive remainders
rmean[rfloor] = mage_p
weights[rfloor] = remain

# Add one to indexing array and add negative remainders to sum
rfloor += 1
rmean_n[rfloor] += mage_n
weights_n[rfloor] += remain_n

# sum
rmean += rmean_n
weights += weights_n
# and normalize sum to average
rmean /= weights

raxis = range(0,rmean.size)
t1 = time.time()
print "Time elapsed = " + str(t1-t0)

0 个答案:

没有答案