对称稀疏矩阵的有效切片

时间:2018-02-15 11:01:42

标签: python cython slice sparse-matrix

我有一个稀疏对称矩阵列表sigma,这样

len(sigma) = N

以及所有i,j,k

sigma[i].shape[0] == sigma[i].shape[1] = m  # Square
sigma[i][j,k] == sigma[i][k,j]  # Symmetric

我有一个索引数组P,以便

P.shape[0] = N
P.shape[1] = k

我的目标是使用k x k给出的索引来提取sigma[i] P[i,:]密集子矩阵。这可以按如下方式完成

sub_matrices = np.empty([N,k,k])
for i in range(N):
    sub_matrices[i,:,:] = sigma[i][np.ix_(P[i,:], P[i,:])].todense()

但请注意,虽然k很小,但N(和m)非常大。如果稀疏对称矩阵以CSR格式存储,则需要很长时间。我觉得必须有一个更好的解决方案。例如,是否存在稀疏格式,适用于需要在两个维度上切片的对称矩阵?

我正在使用Python,但对任何可以使用Cython进行交互的C库建议都是开放的。

EXTRA

请注意,我目前的Cython方法如下:

cimport cython
import numpy as np
cimport numpy as np

@cython.boundscheck(False) # turn off bounds-checking for entire function
cpdef sparse_slice_fast_cy(sigma,
                           long[:,:] P,
                           double[:,:,:] sub_matrices):
    """
    Inputs:
        sigma: A list (N,) of sparse sp.csr_matrix (m x m)
        P: A 2D array of integers (N, k)
        sub_matrices: A 3D array of doubles (N, k, k) containing the slicing
    """
    # Create variables for keeping code tidy
    cdef long N = P.shape[0]
    cdef long k = P.shape[1]

    cdef long i
    cdef long j
    cdef long index_pointer 
    cdef long sparse_row_pointer

    # Create objects for holding sparse matrix data
    cdef double[:] data
    cdef long[:] indices
    cdef long[:] indptr

    # Object for the ordered P
    cdef long[:] perm

    # Make sure sub_matrices is all 0
    sub_matrices[:] = 0

    for i in range(N):
        # Sort the P
        perm = np.argsort(P[i,:])

        # Get the sparse matrix values
        data     = sigma[i].data
        indices  = sigma[i].indices.astype(long)
        indptr   = sigma[i].indptr.astype(long)

        for j in range(k):
            # Loop over row P[i, perm[j]] in sigma searching for values
            # in P[i, :] vector i.e. compare
            #     sigma[P[i, perm[j], :]
            # against
            #     P[i,:]

            # To do this we need our sparse row vector with columns 
            #     indices[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
            # and data/values
            #     data[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
            # which comes from the csr matrix format.
            # We also need our sorted indexing vector
            #     P[i, perm[:]]

            # We begin by pointing at the top of both
            # our vectors and gradually move down them. In the event of 
            # an equality we add the data to sub_matrices[i,:,:] and 
            # increment the INDEXING VECTOR pointer, not the sparse
            # row vector pointer, as there can be multiple values that 
            # are the same in the indexing vector but not the sparse row
            # column vector (only 1 column can appear in 1 row!).
            index_pointer = 0
            sparse_row_pointer = indptr[P[i, perm[j]]]

            while ((index_pointer < k) and (sparse_row_pointer < indptr[P[i, perm[j]] + 1])):
                if indices[sparse_row_pointer] == P[i, perm[index_pointer]]:
                    # We can add data to sub_matrices
                    sub_matrices[i, perm[j], perm[index_pointer]] = \
                           data[sparse_row_pointer]

                    # Only increment the index pointer
                    index_pointer += 1
                elif indices[sparse_row_pointer] > P[i, perm[index_pointer]]:
                    # Need to increment index pointer
                    index_pointer += 1
                else:
                    # Need to increment sparse row pointer
                    sparse_row_pointer += 1

我相信当np.argsort经常在相对较小的向量上调用并且想要交换C实现时,N可能效率低下。我也没有利用可能在prange稀疏矩阵上加速的并行处理。不幸的是,因为外部循环中存在Python强制,我不知道如何使用cimport cython import numpy as np cimport numpy as np @cython.boundscheck(False) # turn off bounds-checking for entire function cpdef sparse_slice_fast_cy(sigma, np.ndarray[np.int32_t, ndim=2] P, np.float64_t[:,:,:] sub_matrices, int symmetric): """ Inputs: sigma: A list (N,) of sparse sp.csr_matrix (m x m) P: A 2D array of integers (N, k) sub_matrices: A 3D array of doubles (N, k, k) containing the slicing symmetric: 1 if the sigma matrices are symmetric """ # Create variables for keeping code tidy cdef np.int32_t N = P.shape[0] cdef np.int32_t k = P.shape[1] cdef np.int32_t i cdef np.int32_t j cdef np.int32_t index_pointer cdef np.int32_t sparse_row_pointer # Create objects for holding sparse matrix data cdef np.float64_t[:] data cdef np.int32_t[:] indices cdef np.int32_t[:] indptr # Object for the ordered P cdef np.int32_t[:,:] perm = np.argsort(P, axis=1).astype(np.int32) # Make sure sub_matrices is all 0 sub_matrices[:] = 0 for i in range(N): # Get the sparse matrix values data = sigma[i].data indices = sigma[i].indices indptr = sigma[i].indptr for j in range(k): # Loop over row P[i, perm[j]] in sigma searching for values # in P[i, :] vector i.e. compare # sigma[P[i, perm[j], :] # against # P[i,:] # To do this we need our sparse row vector with columns # indices[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]] # and data/values # data[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]] # which comes from the csr matrix format. # We also need our sorted indexing vector # P[i, perm[:]] # We begin by pointing at the top of both # our vectors and gradually move down them. In the event of # an equality we add the data to sub_matrices[i,:,:] and # increment the INDEXING VECTOR pointer, not the sparse # row vector pointer, as there can be multiple values that # are the same in the indexing vector but not the sparse row # column vector (only 1 column can appear in 1 row!). if symmetric: index_pointer = j # Only search upper triangular else: index_pointer = 0 sparse_row_pointer = indptr[P[i, perm[i, j]]] while ((index_pointer < k) and (sparse_row_pointer < indptr[P[i, perm[i, j]] + 1])): if indices[sparse_row_pointer] == P[i, perm[i, index_pointer]]: # We can add data to sub_matrices sub_matrices[i, perm[i, j], perm[i, index_pointer]] = \ data[sparse_row_pointer] if symmetric: sub_matrices[i, perm[i, index_pointer], perm[i, j]] = \ data[sparse_row_pointer] # Only increment the index pointer index_pointer += 1 elif indices[sparse_row_pointer] > P[i, perm[i, index_pointer]]: # Need to increment index pointer index_pointer += 1 else: # Need to increment sparse row pointer sparse_row_pointer += 1

另一点需要注意的是,Cython方法似乎使用了大量的内存,但我不知道它的分配位置。

最新版本

根据ead的建议,下面是Cython代码的最新版本

# See https://stackoverflow.com/questions/48805636/efficient-slicing-of-symmetric-sparse-matrices
cimport cython
import numpy as np
cimport numpy as np
from libc.stdlib cimport malloc, free
from cython.parallel import prange

@cython.boundscheck(False) # turn off bounds-checking for entire function
cpdef sparse_slice_fast_cy(sigma,
                           np.ndarray[np.int32_t, ndim=2] P,
                           np.float64_t[:,:,:] sub_matrices,
                           int symmetric):
    """
    Inputs:
        sigma: A list (N,) of sparse sp.csr_matrix (m x m)
        P: A 2D array of integers (N, k)
        sub_matrices: A 3D array of doubles (N, k, k) containing the slicing
        symmetric: 1 if the sigma matrices are symmetric
    """
    # Create variables for keeping code tidy
    cdef np.int32_t N = P.shape[0]
    cdef np.int32_t k = P.shape[1]

    cdef np.int32_t i
    cdef np.int32_t j
    cdef np.int32_t index_pointer 
    cdef np.int32_t sparse_row_pointer

    # Create objects for holding sparse matrix data
    cdef np.float64_t[:] data_mem_view
    cdef np.int32_t[:] indices_mem_view
    cdef np.int32_t[:] indptr_mem_view

    cdef np.float64_t **data = <np.float64_t **> malloc(N * sizeof(np.float64_t *))
    cdef np.int32_t **indices = <np.int32_t **> malloc(N * sizeof(np.int32_t *))
    cdef np.int32_t **indptr = <np.int32_t **> malloc(N * sizeof(np.int32_t *))

    for i in range(N):
        data_mem_view = sigma[i].data
        data[i] = &(data_mem_view[0])

        indices_mem_view = sigma[i].indices
        indices[i] = &(indices_mem_view[0])

        indptr_mem_view = sigma[i].indptr
        indptr[i] = &(indptr_mem_view[0])

    # Object for the ordered P
    cdef np.int32_t[:,:] perm = np.argsort(P, axis=1).astype(np.int32)

    # Make sure sub_matrices is all 0
    sub_matrices[:] = 0

    for i in prange(N, nogil=True):
        for j in range(k):
            # Loop over row P[i, perm[j]] in sigma searching for values
            # in P[i, :] vector i.e. compare
            #     sigma[P[i, perm[j], :]
            # against
            #     P[i,:]
            # To do this we need our sparse row vector with columns 
            #     indices[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
            # and data/values
            #     data[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
            # which comes from the csr matrix format.
            # We also need our sorted indexing vector
            #     P[i, perm[:]]

            # We begin by pointing at the top of both
            # our vectors and gradually move down them. In the event of 
            # an equality we add the data to sub_matrices[i,:,:] and 
            # increment the INDEXING VECTOR pointer, not the sparse
            # row vector pointer, as there can be multiple values that 
            # are the same in the indexing vector but not the sparse row
            # column vector (only 1 column can appear in 1 row!).

            if symmetric:
                index_pointer = j  # Only search upper triangular
            else:
                index_pointer = 0
            sparse_row_pointer = indptr[i][P[i, perm[i, j]]]

            while ((index_pointer < k) and 
                   (sparse_row_pointer < indptr[i][P[i, perm[i, j]] + 1])):
                if indices[i][sparse_row_pointer] == P[i, perm[i, index_pointer]]:
                    # We can add data to sub_matrices
                    sub_matrices[i, perm[i, j], perm[i, index_pointer]] = \
                           data[i][sparse_row_pointer]

                    if symmetric:
                        sub_matrices[i, perm[i, index_pointer], perm[i, j]] = \
                               data[i][sparse_row_pointer]

                    # Only increment the index pointer
                    index_pointer = index_pointer + 1
                elif indices[i][sparse_row_pointer] > P[i, perm[i, index_pointer]]:
                    # Need to increment index pointer
                    index_pointer = index_pointer + 1
                else:
                    # Need to increment sparse row pointer
                    sparse_row_pointer = sparse_row_pointer + 1

    # Free malloc'd data
    free(data)
    free(indices)
    free(indptr)

并行版

下面是一个并行版本,虽然它似乎没有提供任何加速,但代码不再那么漂亮:

cythonize -i sparse_slice.pyx

测试

测试代码运行

sparse_slice.pyx

其中import time import numpy as np import scipy as sp import scipy.sparse from sparse_slice import sparse_slice_fast_cy k = 100 N = 20000 m = 10000 samples = 20 # Create sigma matrices ## The sampling of random sparse takes a while so just do a few and ## then populate with these. now = time.time() sigma_samples = [] for i in range(samples): sigma_samples.append(sp.sparse.rand(m, m, density=0.001, format='csr')) sigma_samples[-1] = sigma_samples[-1] + sigma_samples[-1].T # Symmetric ## Now make the sigma list from these. sigma = [] for i in range(N): j = np.random.randint(samples) sigma.append(sigma_samples[j]) print('Time to make sigma: {}'.format(time.time() - now)) # Create indexer now = time.time() P = np.empty([N, k]).astype(int) for i in range(N): P[i, :] = np.random.choice(np.arange(m), k, replace=True) print('Time to make P: {}'.format(time.time() - now)) # Create objects for holding the slices sub_matrices_slow = np.empty([N, k, k]) sub_matrices_fast = np.empty([N, k, k]) # Run both slicings ## Slow now = time.time() for i in range(N): sub_matrices_slow[i,:,:] = sigma[i][np.ix_(P[i,:], P[i,:])].todense() print('Time to make sub_matrices_slow: {}'.format(time.time() - now)) ## Fast symmetric = 1 now = time.time() sparse_slice_fast_cy(sigma, P.astype(np.int32), sub_matrices_fast, symmetric) print('Time to make sub_matrices_fast: {}'.format(time.time() - now)) assert(np.all((sub_matrices_slow - sub_matrices_fast)**2 < 1e-6)) 是文件名。然后你可以使用这个脚本:

$(document).ready(function(){
    // ARRAY FOR ITEMS
    var items = [];





    /* ***********************************************
    HVAC_VALVE01_SCHED01 - READ
    **************************************************
    */

    for(var r = 1; r < 11; r++) {
    var request = $.ajax
    ({
        type       : "GET",
        url        : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_ONOFF/state"
    });

    request.done( function(data) 
    {

        if(data == "ON") {
            $('.HVAC_VALVE01_SCHED' + r + '_ONOFF').prop('checked', true);
        } else {
            $('.HVAC_VALVE01_SCHED' + r + '_ONOFF').prop('checked', false);
        }

        items["HVAC_VALVE01_SCHED" + r + "_ONOFF"] = data;
    });

    /* */

    var request = $.ajax
    ({
        type       : "GET",
        url        : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_URA/state"
    });

    request.done( function(data) 
    { 
        $(".HVAC_VALVE01_SCHED" + r + "_URA").val(data);
        items["HVAC_VALVE01_SCHED" + r + "_URA"] = data;
    });

    /* */

    var request = $.ajax
    ({
        type       : "GET",
        url        : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_MINUTA/state"
    });

    request.done( function(data) 
    { 
        $(".HVAC_VALVE01_SCHED" + r + "_MINUTA").val(data);
        items["HVAC_VALVE01_SCHED" + r + "_MINUTA"] = data;
    });

    /* */

    var request = $.ajax
    ({
        type       : "GET",
        url        : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_PO/state"
    });

    request.done( function(data) 
    { 

        if(data == "ON") {
            $(".HVAC_VALVE01_SCHED" + r + "_PO").css('background', 'blue');
            items["HVAC_VALVE01_SCHED" + r + "_PO"] = "ON";
        } else {
            $(".HVAC_VALVE01_SCHED" + r + "_PO").css('background', 'black');
            items["HVAC_VALVE01_SCHED" + r + "_PO"] = "OFF";
        }
    });

    /* */

    var request = $.ajax
    ({
        type       : "GET",
        url        : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_TO/state"
    });

    request.done( function(data) 
    { 

        if(data == "ON") {
            $(".HVAC_VALVE01_SCHED" + r + "_TO").css('background', 'blue');
            items["HVAC_VALVE01_SCHED" + r + "_TO"] = "ON";
        } else {
            $(".HVAC_VALVE01_SCHED" + r + "_TO").css('background', 'black');
            items["HVAC_VALVE01_SCHED" + r + "_TO"] = "OFF";
        }
    });

    /* */

    var request = $.ajax
    ({
        type       : "GET",
        url        : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_SR/state"
    });

    request.done( function(data) 
    { 

        if(data == "ON") {
            $(".HVAC_VALVE01_SCHED" + r + "_SR").css('background', 'blue');
            items["HVAC_VALVE01_SCHED" + r + "_SR"] = "ON";
        } else {
            $(".HVAC_VALVE01_SCHED" + r + "_SR").css('background', 'black');
            items["HVAC_VALVE01_SCHED" + r + "_SR"] = "OFF";
        }
    });

    /* */

    var request = $.ajax
    ({
        type       : "GET",
        url        : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_CE/state"
    });

    request.done( function(data) 
    { 

        if(data == "ON") {
            $(".HVAC_VALVE01_SCHED" + r + "_CE").css('background', 'blue');
            items["HVAC_VALVE01_SCHED" + r + "_CE"] = "ON";
        } else {
            $(".HVAC_VALVE01_SCHED" + r + "_CE").css('background', 'black');
            items["HVAC_VALVE01_SCHED" + r + "_CE"] = "OFF";
        }
    });

    /* */

    var request = $.ajax
    ({
        type       : "GET",
        url        : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_PE/state"
    });

    request.done( function(data) 
    { 

        if(data == "ON") {
            $(".HVAC_VALVE01_SCHED" + r + "_PE").css('background', 'blue');
            items["HVAC_VALVE01_SCHED" + r + "_PE"] = "ON";
        } else {
            $(".HVAC_VALVE01_SCHED" + r + "_PE").css('background', 'black');
            items["HVAC_VALVE01_SCHED" + r + "_PE"] = "OFF";
        }
    });

    /* */

    var request = $.ajax
    ({
        type       : "GET",
        url        : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_SO/state"
    });

    request.done( function(data) 
    { 

        if(data == "ON") {
            $(".HVAC_VALVE01_SCHED" + r + "_SO").css('background', 'blue');
            items["HVAC_VALVE01_SCHED" + r + "_SO"] = "ON";
        } else {
            $(".HVAC_VALVE01_SCHED" + r + "_SO").css('background', 'black');
            items["HVAC_VALVE01_SCHED" + r + "_SO"] = "OFF";
        }
    });

    /* */

    var request = $.ajax
    ({
        type       : "GET",
        url        : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_NE/state"
    });

    request.done( function(data) 
    { 

        if(data == "ON") {
            $(".HVAC_VALVE01_SCHED" + r + "_NE").css('background', 'blue');
            items["HVAC_VALVE01_SCHED" + r + "_NE"] = "ON";
        } else {
            $(".HVAC_VALVE01_SCHED" + r + "_NE").css('background', 'black');
            items["HVAC_VALVE01_SCHED" + r + "_NE"] = "OFF";
        }
    });
}

1 个答案:

答案 0 :(得分:2)

目前无法测试,但有两个建议:

A)对i - 循环的所有行进行排序:

# Object for the ordered P
cdef long[:,:] perm = np.argsort(P, axis=1)

也许您需要将P传递为np.ndarray[np.int64_t, ndim=2] P(或其任何类型)以避免复制。您必须通过perm[i,X]而不是perm[X]访问数据。

B)定义

cdef np.int32_t[:] indices
cdef np.int32_t[:] indptr

所以你不需要通过&#39; .astype`复制数据,即

for i in range(N):
    data     = sigma[i].data
    indices  = sigma[i].indices
    indptr   = sigma[i].indptr

我认为因为sigma[i]O(m)个元素,复制是你的函数的瓶颈:你得到的运行时间O(N*(m+k^2))而不是'O(N * k ^ 2) - 它很好避免它。

否则该功能看起来不太糟糕。

要使prangei - 循环一起使用,您应该通过创建一种指向{的第一个元素的指针数组,将访问移动到循环之外的sigma[i]。 {1}},dataindices并在便宜的预处理步骤中填充它们。一个人可以使它工作,但问题是并行化带来了多少收益 - 很可能是这样,问题是内存限制的 - 人们必须看到时间安排。

您也可以通过仅处理上三角矩阵来使用对称性:

indptr

我会从B)开始,看看它是如何运作的......

修改

关于内存使用情况:可以通过

测量峰值内存使用情况
  ...
  index_pointer = j #only upper triangle!
  ....
  ....
     # We can add data to sub_matrices
     #upper triangle sub-matrix:
     sub_matrices[i, perm[j], perm[index_pointer]] = \
                       data[sparse_row_pointer]
     #lower triangle sub-matrix:
     sub_matrices[i, perm[index_pointer], perm[j]] = \
                       data[sparse_row_pointer]
  ....

我用 /usr/bin/time -f "peak_used_memory:%M(in Kb)" python test.py 运行我的测试并得到(python3.6 + cython0.27.1):

N=2000

因此有50Mb的开销,200Mb被任一函数使用,另外176Mb用于评估断言。对于 peak memory usage only slow 245Mb only fast 245Mb slow+fast no check 402Mb slow+fast+assert 576Mb 的其他值,我也可以看到相同的行为。

所以我想说cython没有大量的内存使用。

此任务很可能(至少部分)内存限制,因此并行化将无济于事。您应该减少加载到缓存的内存量。

一种可能性是不使用N - 毕竟它还需要加载到缓存中。如果

,你可以这样做
  1. 您可以使用矩阵西格玛中的任何行/列排列,而不仅仅是排序perm并使用它。
  2. 每行的元素非常少,因此对每个元素进行线性搜索都可以。
  3. 对每个元素进行二元搜索
  4. 我猜你在最好的情况下可以赢得大约20-30%。

    有时cython产生的代码对于c编译器来说不容易优化,而且直接在C中编写然后用python包装它会获得更好的结果。

    但是,只有当这个操作确实是你的程序的瓶颈时,我才能做到这一切。

    顺便说一句,宣布

    P

    您无需额外复制。