最快的方式来对稀疏矩阵的行求和

时间:2017-09-28 18:21:12

标签: python numpy scipy sparse-matrix

我有一个很大的csr_matrix(1M * 1K),我想添加行并获得一个新的csr_matrix,其列数相同但行数减少。实际上我的问题与此Sum over rows in scipy.sparse.csr_matrix完全相同。唯一的问题是我发现接受的解决方案对我来说是缓慢的。让我说明我拥有的东西

map_fn = np.random.randint(0, 10000, 1000000)

map_fn这里告诉我输入行(1M)如何映射到我的输出行(10K)。例如,第i个输入行被添加到map_fn[i]输出行。我尝试了上述问题中提到的两种方法,  即形成稀疏矩阵并使用稀疏和。虽然稀疏矩阵方法看起来比稀疏和方法更好,但我觉得它的目的很慢。以下是比较两种方法的代码:

import scipy.sparse
import numpy as np 
import time

print "Setting up input"
s=10000
n=1000000
d=1000
density=1.0/500

X=scipy.sparse.rand(n,d,density=density,format="csr")
map_fn=np.random.randint(0, s, n)

# Approach 1
start_time=time.time()
col = scipy.arange(n)
val = np.ones(n)
S = scipy.sparse.csr_matrix( (val, (map_fn, col)), shape = (s,n))
print "Approach 1 Creation time : ",time.time()-start_time
SX = S.dot(X)
print "Approach 1 Total time : ",time.time()-start_time

#Approach 2
start_time=time.time()
SX = np.zeros((s,X.shape[1]))
for i in range(SX.shape[0]):
    SX[i,:] = X[np.where(map_fn==i)[0],:].sum(axis=0)

print "Approach 2 Total time : ",time.time()-start_time

给出以下数字:

Approach 1 Creation time :  0.187678098679
Approach 1 Total time :  0.286989927292
Approach 2 Total time :  10.208632946

所以我的问题是这有更好的方法吗?我发现形成稀疏矩阵是一种过度杀伤,因为它需要超过一半的时间。还有更好的选择吗?任何建议都非常感谢。谢谢

1 个答案:

答案 0 :(得分:4)

启动方法

改编sparse solution from this post -

def sparse_matrix_mult_sparseX_mod1(X, rows):   
    nrows = rows.max()+1
    ncols = X.shape[1]
    nelem = nrows * ncols

    a,b = X.nonzero()
    ids = rows[a] + b*nrows
    sums = np.bincount(ids, X[a,b].A1, minlength=nelem)
    out = sums.reshape(ncols,-1).T
    return out

<强>基准

原创方法#1 -

def app1(X, map_fn):
    col = scipy.arange(n)
    val = np.ones(n)
    S = scipy.sparse.csr_matrix( (val, (map_fn, col)), shape = (s,n))
    SX = S.dot(X)
    return SX

计时和验证 -

In [209]: # Inputs setup
     ...: s=10000
     ...: n=1000000
     ...: d=1000
     ...: density=1.0/500
     ...: 
     ...: X=scipy.sparse.rand(n,d,density=density,format="csr")
     ...: map_fn=np.random.randint(0, s, n)
     ...: 

In [210]: out1 = app1(X, map_fn)
     ...: out2 = sparse_matrix_mult_sparseX_mod1(X, map_fn)
     ...: print np.allclose(out1.toarray(), out2)
     ...: 
True

In [211]: %timeit app1(X, map_fn)
1 loop, best of 3: 517 ms per loop

In [212]: %timeit sparse_matrix_mult_sparseX_mod1(X, map_fn)
10 loops, best of 3: 147 ms per loop

公平地说,我们应该从app1 -

计算最终的密集阵列版本
In [214]: %timeit app1(X, map_fn).toarray()
1 loop, best of 3: 584 ms per loop

移植到Numba

我们可以将分箱计数步骤转换为numba,这可能对更密集的输入矩阵有益。其中一种方法是 -

from numba import njit

@njit
def bincount_mod2(out, rows, r, C, V):
    N = len(V)
    for i in range(N):
        out[rows[r[i]], C[i]] += V[i]
    return out

def sparse_matrix_mult_sparseX_mod2(X, rows):
    nrows = rows.max()+1
    ncols = X.shape[1]
    r,C = X.nonzero()

    V = X[r,C].A1
    out = np.zeros((nrows, ncols))
    return bincount_mod2(out, rows, r, C, V)

计时 -

In [373]: # Inputs setup
     ...: s=10000
     ...: n=1000000
     ...: d=1000
     ...: density=1.0/100 # Denser now!
     ...: 
     ...: X=scipy.sparse.rand(n,d,density=density,format="csr")
     ...: map_fn=np.random.randint(0, s, n)
     ...: 

In [374]: %timeit app1(X, map_fn)
1 loop, best of 3: 787 ms per loop

In [375]: %timeit sparse_matrix_mult_sparseX_mod1(X, map_fn)
1 loop, best of 3: 906 ms per loop

In [376]: %timeit sparse_matrix_mult_sparseX_mod2(X, map_fn)
1 loop, best of 3: 705 ms per loop

来自app1 -

的密集输出
In [379]: %timeit app1(X, map_fn).toarray()
1 loop, best of 3: 910 ms per loop