Question

我有一个函数，它基本上只是对一个简单定义的哈希函数进行大量调用，并测试它何时找到重复。我需要用它做很多模拟，所以希望它尽可能快。我试图用cython来做这件事。 cython代码当前使用普通的python整数列表调用，其值在0到m ^ 2之间。

import math, random
cdef int a,b,c,d,m,pos,value, cyclelimit, nohashcalls   
def h3(int a,int b,int c,int d, int m,int x):
    return (a*x**2 + b*x+c) %m    
def floyd(inputx):
    dupefound, nohashcalls = (0,0)
    m = len(inputx)
    loops = int(m*math.log(m))
    for loopno in xrange(loops):
        if (dupefound == 1):
            break
        a = random.randrange(m)
        b = random.randrange(m)
        c = random.randrange(m)
        d = random.randrange(m)
        pos = random.randrange(m)
        value = inputx[pos]
        listofpos = [0] * m
        listofpos[pos] = 1
        setofvalues = set([value])
        cyclelimit = int(math.sqrt(m))
        for j in xrange(cyclelimit):
            pos = h3(a,b, c,d, m, inputx[pos])
            nohashcalls += 1    
            if (inputx[pos] in setofvalues):
                if (listofpos[pos]==1):
                    dupefound = 0
                else:
                    dupefound = 1
                    print "Duplicate found at position", pos, " and value", inputx[pos]
                break
            listofpos[pos] = 1
            setofvalues.add(inputx[pos])
    return dupefound, nohashcalls

如何将inputx和listofpos转换为使用C类型数组并以C速度访问数组？我还可以使用其他加速吗？可以加快设定价值吗？

因此，有一些东西需要比较，对m = 5000的floyd（）的50次调用目前在我的计算机上需要大约30秒。

更新：显示如何调用floyd的示例代码段。

m = 5000
inputx = random.sample(xrange(m**2), m)
(dupefound, nohashcalls) = edcython.floyd(inputx)

Answer 1

首先，您似乎必须在函数中键入变量。 A good example of it is here.

其次，cython -a，对于“annotate”，它为你提供了一个非常好的细分，它由cython编译器生成的代码和颜色编码的指示表示它是多么脏（读：python api heavy）。在尝试优化任何内容时，此输出非常重要。

第三，working with Numpy上现在着名的页面解释了如何快速，C风格地访问Numpy阵列数据。不幸的是，这是冗长而烦人的。然而，我们很幸运，因为最近的Cython提供了Typed Memory Views，它们既易于使用又 awesome 。在尝试执行任何其他操作之前，请先阅读整个页面。

十分钟左右后，我想出了这个：

# cython: infer_types=True # Use the C math library to avoid Python overhead. from libc cimport math # For boundscheck below. import cython # We're lazy so we'll let Numpy handle our array memory management. import numpy as np # You would normally also import the Numpy pxd to get faster access to the Numpy # API, but it requires some fancier compilation options so I'll leave it out for # this demo. # cimport numpy as np import random # This is a small function that doesn't need to be exposed to Python at all. Use # `cdef` instead of `def` and inline it. cdef inline int h3(int a,int b,int c,int d, int m,int x): return (a*x**2 + b*x+c) % m # If we want to live fast and dangerously, we tell cython not to check our array # indices for IndexErrors. This means we CAN overrun our array and crash the # program or screw up our stack. Use with caution. Profiling suggests that we # aren't gaining anything in this case so I leave it on for safety. # @cython.boundscheck(False) # `cpdef` so that calling this function from another Cython (or C) function can # skip the Python function call overhead, while still allowing us to use it from # Python. cpdef floyd(int[:] inputx): # Type the variables in the scope of the function. cdef int a,b,c,d, value, cyclelimit cdef unsigned int dupefound = 0 cdef unsigned int nohashcalls = 0 cdef unsigned int loopno, pos, j # `m` has type int because inputx is already a Cython memory view and # `infer-types` is on. m = inputx.shape[0] cdef unsigned int loops = int(m*math.log(m)) # Again using the memory view, but letting Numpy allocate an array of zeros. cdef int[:] listofpos = np.zeros(m, dtype=np.int32) # Keep this random sampling out of the loop cdef int[:, :] randoms = np.random.randint(0, m, (loops, 5)).astype(np.int32) for loopno in range(loops): if (dupefound == 1): break # From our precomputed array a = randoms[loopno, 0] b = randoms[loopno, 1] c = randoms[loopno, 2] d = randoms[loopno, 3] pos = randoms[loopno, 4] value = inputx[pos] # Unforunately, Memory View does not support "vectorized" operations # like standard Numpy arrays. Otherwise we'd use listofpos *= 0 here. for j in range(m): listofpos[j] = 0 listofpos[pos] = 1 setofvalues = set((value,)) cyclelimit = int(math.sqrt(m)) for j in range(cyclelimit): pos = h3(a, b, c, d, m, inputx[pos]) nohashcalls += 1 if (inputx[pos] in setofvalues): if (listofpos[pos]==1): dupefound = 0 else: dupefound = 1 print "Duplicate found at position", pos, " and value", inputx[pos] break listofpos[pos] = 1 setofvalues.add(inputx[pos]) return dupefound, nohashcalls

这里没有任何技巧没有在docs.cython.org上解释，这是我自己学习的地方，但有助于看到它们在一起。

原始代码的最重要更改在注释中，但它们都等于提供有关如何生成不使用Python API的代码的Cython提示。

暂且不说：我真的不知道为什么infer_types默认不启用。它让编译器隐含地使用C类型而不是Python类型，这意味着更少的工作。

如果你对此运行cython -a，你会看到调用Python的唯一行是你对random.sample的调用，以及构建或添加到Python set（）。

在我的机器上，原始代码在2.1秒内运行。我的版本在0.6秒内运行。

~~下一步是将random.sample从该循环中取出，但我会留给你。~~

我编辑了我的答案，演示如何预先计算兰特样本。这会将时间缩短到 0.4秒。

Answer 2

您是否需要使用此特定哈希算法？为什么不对dicts使用内置的哈希算法？例如：

from collections import Counter
cnt = Counter(inputx)
dupes = [k for k, v in cnt.iteritems() if v > 1]

用cython加速python代码

2 个答案: