Question

抱歉，如果没有得到答案，我不知道重新提问的协议。几个月前在这里问过这个问题：Numpy sum between pairs of indices in 2d array

我有一个2-d numpy数组（MxN）和另外两个1-d数组（Mx1），它们代表了我想要总结的2-d数组的每一行的起始和结束索引。我正在寻找在大型阵列中执行此操作的最有效方法（最好不必使用循环，这是我目前正在做的事情）。我想做的一个例子如下。

>>> random.seed(1234)
>>> a = random.rand(4,4)
>>> print a
[[ 0.19151945  0.62210877  0.43772774  0.78535858]
 [ 0.77997581  0.27259261  0.27646426  0.80187218]
 [ 0.95813935  0.87593263  0.35781727  0.50099513]
 [ 0.68346294  0.71270203  0.37025075  0.56119619]]
>>> b = array([1,0,2,1])
>>> c = array([3,2,4,4])
>>> d = empty(4)
>>> for i in xrange(4):
    d[i] = sum(a[i, b[i]:c[i]]) 

>>> print d
[ 1.05983651  1.05256841  0.8588124   1.64414897]

我的问题类似于以下问题，但是，我不认为那里提出的解决方案会非常有效。 Numpy sum of values in subarrays between pairs of indices在该问题中，他们希望找到同一行的多个子集的总和，因此可以使用cumsum()。但是，我只会在每行中找到一个总和，所以我认为这不是计算总和的最有效方法。

Answer 1

编辑为目前为止所有答案添加了时间结果，包括以下@ seberg评论后的OP代码，OP的方法最快：

def sliced_sum_op(a, b, c) :
    d = np.empty(a.shape[0])
    for i in xrange(a.shape[0]):
        d[i] = np.sum(a[i, b[i]:c[i]]) 
    return d

你仍然可以通过np.cumsum ~~以大幅提升~~来完成它，尽管它需要的存储空间相当于原始数组的大小：

def sliced_sum(a, b, c) :
    cum = np.cumsum(a, axis=1)
    cum = np.hstack((np.zeros((a.shape[0], 1), dtype=a.dtype), cum))
    rows = np.arange(a.shape[0])
    return cum[rows, c] - cum[rows, b]

~~时间序列对您的数组具有欺骗性，因为对于小数组大小，您的方法实际上比这个稍微快一些。但是numpy很快就赢了，~~请看下面的图表，了解大小为(n, n)的随机方阵的时间：

enter image description here

以上是用

生成的

import timeit
import matplotlib.pyplot as plt

n = np.arange(10, 1000, 10)
op = np.zeros(n.shape[0])
me = np.zeros(n.shape[0])
th = np.zeros(n.shape[0])
jp = np.zeros(n.shape[0])
for j, size in enumerate(n) :
    a = np.random.rand(size, size)
    b, c = indices = np.sort(np.random.randint(size + 1,
                                               size=(2, size)), axis=0)
    np.testing.assert_almost_equal(sliced_sum_op(a, b, c),
                                   sliced_sum(a, b, c))
    np.testing.assert_almost_equal(sliced_sum_op(a, b, c),
                                   sum_between2(a, b, c))
    np.testing.assert_almost_equal(sliced_sum_op(a, b, c),
                                   sum_between_mmult(a, b, c))

    op[j] = timeit.timeit('sliced_sum_op(a, b, c)',
                          'from __main__ import sliced_sum_op, a, b, c',
                          number=10)
    me[j] = timeit.timeit('sliced_sum(a, b, c)',
                          'from __main__ import sliced_sum, a, b, c',
                          number=10)
    th[j] = timeit.timeit('sum_between2(a, b, c)',
                          'from __main__ import sum_between2, a, b, c',
                          number=10)
    jp[j] = timeit.timeit('sum_between_mmult(a, b, c)',
                          'from __main__ import sum_between_mmult, a, b, c',
                          number=10)
plt.subplot(211)
plt.plot(n, op, label='op')
plt.plot(n, me, label='jaime')
plt.plot(n, th, label='thorsten')
plt.plot(n, jp, label='japreiss')
plt.xlabel('n')
plt.legend(loc='best')
plt.show()

Answer 2

我喜欢@ Jaime的答案，但这是另一种方法。你可以用矩阵乘法重构问题。

如果将a乘以所有1的向量，则输出向量的每个元素将包含相应行a的总和。要获得所需的d，您可以屏蔽掉每行中排除的元素，然后乘以所有1的向量来获取d。

def sum_between_mmult(ar, b, c):
    copy = np.copy(ar)
    nrows = ar.shape[0]
    ncols = ar.shape[1]
    for i in range(nrows):
        copy[i, :b[i]] = 0
        copy[i, c[i]:] = 0
    onevec = np.ones(ncols)
    return np.dot(copy, onevec)

与@Jaime相同，我只看到了更大矩阵大小的加速。我觉得某种花哨的索引技巧可以摆脱for循环并提供更大的加速。如果你不需要原始数组，你可以覆盖它而不是复制，但这在我的测试中没有产生太大的加速。

Answer 3

我有另一种计算结果的方法，它可以工作但不使用循环 - 但它并不比循环方法快。

import time
import numpy as np

def sum_between1(ar, idc_l, idc_u):
    d = np.empty(ar.shape[0])
    for i in xrange(ar.shape[0]):
        d[i] = sum(ar[i, b[i]:c[i]]) 
    return d

def sum_between2(ar, idc_l, idc_u):
    indices = np.arange(ar.shape[1]).reshape(1,-1).repeat(ar.shape[0], axis=0)
    lower = idc_l.reshape(-1,1).repeat(ar.shape[1], axis=1)    
    upper = idc_u.reshape(-1,1).repeat(ar.shape[1], axis=1)
    mask = ~((indices>=lower) * (indices<upper))
    masked = np.ma.MaskedArray(ar, mask)
    return masked.sum(axis=1)

np.random.seed(1234)
a = np.random.rand(4,4)
print a
b = np.array([1,0,2,1])
c = np.array([3,2,4,4])

t0 = time.time()
for i in range(100000):
    d1 = sum_between1(a,b,c)
print "sum_between1: %.3f seconds" % (time.time()-t0)
print d1

t0 = time.time()
for i in range(100000):
    d2 = sum_between2(a,b,c)
print "sum_between2: %.3f seconds" % (time.time()-t0)
print d2

我的输出是

  [[ 0.19151945  0.62210877  0.43772774 ...,  0.92486763  0.44214076
   0.90931596]
 [ 0.05980922  0.18428708  0.04735528 ...,  0.53585166  0.00620852
   0.30064171]
 [ 0.43689317  0.612149    0.91819808 ...,  0.18258873  0.90179605
   0.70652816]
 ..., 
 [ 0.70568819  0.76402889  0.34460786 ...,  0.6933128   0.07778623
   0.4040815 ]
 [ 0.51348689  0.80706629  0.09896631 ...,  0.91118062  0.87656479
   0.96542923]
 [ 0.20231131  0.72637586  0.57131802 ...,  0.5661444   0.14668441
   0.09974442]]
sum_between1: 2.263 seconds
[ 1.05983651  0.24409631  1.54393475  2.27840642  1.65049179  1.86027107
  0.74002457  0.91248001  1.29180203  1.03592483  0.30448954  0.78028893
  1.15511632  1.74568981  1.0551406   1.73598504  1.32397106  0.22902658
  0.77533999  2.11800627  1.09181484  0.92074516  1.04588589  2.07584895
  1.13615918  1.33172081  1.41323751  2.01996291  1.69677797  0.57592999
  1.18049304  1.13052798  0.90715138  0.63876336  1.76712974  1.15138181
  0.29005541  1.46971707  0.57149804  1.8816212 ]
sum_between2: 1.817 seconds
[1.05983651005 0.244096306594 1.54393474534 2.27840641818 1.65049178537
 1.86027106627 0.740024568268 0.91248000774 1.29180203183 1.03592482812
 0.304489542783 0.78028892993 1.1551163203 1.74568980609 1.05514059758
 1.73598503833 1.32397105753 0.229026581839 0.77533999391 2.11800626878
 1.09181484127 0.92074516366 1.04588588779 2.07584895325 1.13615918351
 1.33172081033 1.41323750936 2.01996291037 1.69677797485 0.575929991717
 1.18049303662 1.13052797976 0.907151384823 0.638763358104 1.76712974497
 1.15138180543 0.290055405809 1.46971707447 0.571498038664 1.88162120474]

我发布这个答案是因为也许其他人可能知道如何改进我的方法以使其更快。

Answer 4

这大约快了25％：

def zip_comp(a,b,c):
    return [np.sum(aa[bb:cc]) for aa, bb, cc in zip(a,b,c)]

如果你能够重构早期代码，而不是为切片生成两个列表，它会生成一个二进制二维数组，那么你可以使用@japreiss方法的后半部分或类似的东西来获得非常显着的增益。所有这些方法的放缓是花时间搞乱疯狂的索引。

速度比较，使用Jaime的代码：

enter image description here

2d数组中索引对之间的和

4 个答案: