Question

我已经尝试了很长时间来实现我的代码以在GPU上运行，但是收效甚微。我真的很感谢有人帮助实施。

让我说说这个问题。我有一个带有N个节点的图G，并且在每个节点x上都有一个分布mx。我想为所有边缘的每对节点计算分布之间的距离。对于给定的一对（x，y），我使用python POT包中的代码ot.sinkhorn(mx, my, dNxNy)来计算距离。再次，mx，my是节点x和y上大小为Nx和Ny的向量，而dNxNy是Nx x Ny距离矩阵。

现在，我发现此代码ot.gpu.sinkhorn(mx, my, dNxNy)有一个GPU实现。但是，这还不够好，因为我mx，my和dNxNy在每次迭代时都需要上传到GPU，这是一个巨大的开销。因此，我们的想法是针对GPU上的所有边缘进行并行处理。

代码的本质如下。 mx_all是所有发行版

for i,e in enumerate(G.edges):
    W[i] = W_comp(mx_all,dist,e)

def W_comp(mx_all, dist,  e):
    i = e[0]
    j = e[1]

    Nx = np.array(mx_all[i][1]).flatten()
    Ny = np.array(mx_all[j][1]).flatten()
    mx = np.array(mx_all[i][0]).flatten()
    my = np.array(mx_all[j][0]).flatten()

    dNxNy = dist[Nx,:][:,Ny].copy(order='C')

    W = ot.sinkhorn2(mx, my, dNxNy, 1)

下面是一个最小的工作示例。请忽略除虚线===之间的部分以外的所有内容。

import ot
import numpy as np
import scipy as sc


def main():
    import networkx as nx

    #some example graph
    G = nx.planted_partition_graph(4, 20, 0.6, 0.3, seed=2)
    L = nx.normalized_laplacian_matrix(G)

    #this just computes all distributions (IGNORE)
    mx_all = []
    for i in G.nodes:
        mx_all.append(mx_comp(L,1,1,i))  

    #some random distance matrix (IGNORE)
    dist = np.random.randint(5,size=(nx.number_of_nodes(G),nx.number_of_nodes(G)))          

# ============================================================================= 
#this is what needs to be parallelised on GPU
    W = np.zeros(nx.Graph.size(G))
    for i,e in enumerate(G.edges):
        print(i)
        W[i] = W_comp(mx_all,dist,e)

    return W

def W_comp(mx_all, dist,  e):
    i = e[0]
    j = e[1]

    Nx = np.array(mx_all[i][1]).flatten()
    Ny = np.array(mx_all[j][1]).flatten()
    mx = np.array(mx_all[i][0]).flatten()
    my = np.array(mx_all[j][0]).flatten()

    dNxNy = dist[Nx,:][:,Ny].copy(order='C')

    return ot.sinkhorn2(mx, my, dNxNy,1)

# =============================================================================

#some other functions (IGNORE)
def delta(i, n):

    p0 = np.zeros(n)
    p0[i] = 1.

    return p0

# all neighbourhood densities
def mx_comp(L, t, cutoff, i):
    N = np.shape(L)[0]

    mx_all = sc.sparse.linalg.expm_multiply(-t*L, delta(i, N))
    Nx_all = np.argwhere(mx_all > (1-cutoff)*np.max(mx_all))

    return mx_all, Nx_all  

if __name__ == "__main__":
    main()

谢谢！

Answer 1

有些软件包可以让您在GPU上运行代码。

您可以使用以下软件包之一：

pyCuda
numba（专业版）
Theano

当您想使用numba时，建议使用Python Anaconda发行版。另外，需要Anaconda Accelerate。您可以使用conda install accelerate安装它。在此示例中，您可以看到https://gist.githubusercontent.com/aweeraman/ae6e40f54a924f1f5832081be9521d92/raw/d6775c421aa4fa4c0d582e6c58873499d28b913a/gpu.py如何实现GPU的使用。这是通过将target='cuda'添加到@vectorize装饰器来完成的。注意导入from numba import vectorize。向量化装饰器将要加速的功能的签名作为输入。

祝你好运！

来源：

https://weeraman.com/put-that-gpu-to-good-use-with-python-e5a437168c01 https://www.researchgate.net/post/How_do_I_run_a_python_code_in_the_GPU

Python：如何编写此代码以在GPU上运行？

1 个答案: