Question

我正在尝试编写调用以下cython函数test1的python代码：

def test1( np.ndarray[np.int32_t, ndim=2] ndk, 
           np.ndarray[np.int32_t, ndim=2] nkw, 
           np.ndarray[np.float64_t, ndim=2] phi):

    for _ in xrange(int(1e5)):
        test2(ndk, nkw, phi)


cdef int test2(np.ndarray[np.int32_t, ndim=2] ndk,
               np.ndarray[np.int32_t, ndim=2] nkw,
               np.ndarray[np.float64_t, ndim=2] phi):
    return 1

我的纯python代码将调用test1并传递3个numpy数组作为参数，它们非常大（大约10 ^ 4 * 10 ^ 3）。 test1将依次调用使用 cdef 关键字定义的test2并传递这些数组。由于test1需要在返回之前多次调用test2（大约10 ^ 5），并且不需要在cython代码之外调用test2，我使用 cdef 而不是 def

但问题是，每次test1调用test2时，内存开始稳定增加。我试图在这个cython代码之外调用gc.collect()，但它不起作用。最后，程序将被系统杀死，因为它消耗了所有的记忆。我注意到这个问题只发生在 cdef 和 cpdef 函数中，如果我将其更改为 def ，它可以正常工作。

我认为test1应该将这些数组的引用传递给test2而不是对象。但似乎它创建了这些数组的新对象并将它们传递给test2，之后python gc永远不会触及这些对象。

我错过了什么吗？

Answer 1

我仍然对这个问题感到困惑。但我找到了绕过这个问题的另一种方法。只需明确告诉cython如下所示传递指针：

def test1( np.ndarray[np.int32_t, ndim=2] ndk, 
           np.ndarray[np.int32_t, ndim=2] nkw, 
           np.ndarray[np.float64_t, ndim=2] phi):

for _ in xrange(int(1e5)):
    test2(&ndk[0,0], &nkw[0,0], &phi[0,0])


cdef int test2(np.int32_t* ndk,
               np.int32_t* nkw,
               np.float64_t* phi):
    return 1

但是，您需要像这样索引数组：ndk[i*row_len + j] 细节：https://github.com/cython/cython/wiki/tutorials-NumpyPointerToC

Answer 2

我遇到了类似的问题，并已经使用memory views解决了。作为解决泄漏的附带好处，与指针相比，此方法的使用也更简单：

类型化的内存视图允许有效访问内存缓冲区，例如底层的NumPy数组，而不会产生任何Python开销。内存视图类似于当前的NumPy数组缓冲区支持（np.ndarray [np.float64_t，ndim = 2]），但是它们具有更多功能和更简洁的语法。

不幸的是，我无法弄清楚为什么前一种方法会导致内存泄漏-我只能猜测指向该数据的指针在某处保持活动状态，并防止数据被垃圾回收。也许有人可以对此发表更好的见解。

无论如何，您的代码在此接口上应该可以正常工作（例如功能“ test2”的示例，但也适用于“ test1”的示例）：

cdef int test2(int[:,:] ndk, 
               int[:,:] nkw, 
               float[:,:] phi):

    # can access data using the referenced memory space, as if it's a regular numpy array 
    # (including properties such as .shape etc. - i.e.:
    # cdef int some_int = ndk[0, 5] <--- return the primitive value stored in [0,5] 
    # ndk.shape <--- will return the shape of the array.

    # NOTE: the original array (i.e. ndk which is passed into the function) should 
    # be an "exportable" object, and is presumably created by the caller 
    # (a python/Cython/Numpy array is such an exportable object)

    return 1

内存泄漏调用cython函数与大numpy数组参数？

2 个答案: