Question

我希望从给定的二进制矩阵G计算所谓的区别的数量。假设G的行对应于某些个体及其对某些测试用例的列，测试所做的区别被定义为对的数量它所区分的个体。

我想出了一个非常简单的实现：

distinctions = np.zeros(G.shape[1])
for p in itertools.combinations(np.arange(G.shape[0]), 2):
    distinctions += G[p[0], :] != G[p[1], :]

但这会减慢我的需求。如果你能帮我加速这段代码，我将非常感激。

Answer 1

您不需要知道1和0的实际位置，您只需要知道它们中有多少。例如，在数组

中

array([[1, 1, 0, 1],
       [1, 1, 1, 0],
       [1, 0, 0, 1]])

我们看到测试0区分无人（0），测试1可区分＃0和＃1与＃2，对于（2）*（1）总差异，测试2可区分＃1与＃0和＃2，对于（1）*（2）总差异，并且测试3可以区分＃0和＃2与＃1，对于（2）*（1）总差异，这给了我们

[0, 2, 2, 2]

实际上，我们只需要计算一列中的1的数量，然后乘以该列中的0的数量，因为每个1都会产生（num_zeroes）区别。 IOW：

def slow(G):
    distinctions = np.zeros(G.shape[1])
    for p in itertools.combinations(np.arange(G.shape[0]), 2):
        distinctions += G[p[0], :] != G[p[1], :]
    return distinctions

def fast(G):
    ones = np.count_nonzero(G, axis=0)
    return ones * (G.shape[0] - ones)

给了我

In [125]: G
Out[125]: 
array([[0, 1, 0, 0, 1, 1, 0],
       [1, 0, 0, 0, 0, 0, 1],
       [1, 0, 1, 1, 1, 0, 0],
       [1, 1, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1]])

In [126]: slow(G)
Out[126]: array([6., 6., 4., 6., 6., 4., 6.])

In [127]: fast(G)
Out[127]: array([6, 6, 4, 6, 6, 4, 6])

和

In [130]: G = np.random.randint(0, 2, (1000, 1000))

In [131]: %timeit fast(G)
7.87 ms ± 344 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

计算区别：加快矩阵中所有行组合的操作

1 个答案: