Question

我使用邻接矩阵来表示朋友网络，可以直观地解释为

Mary     0        1      1      1

Joe      1        0      1      1

Bob      1        1      0      1

Susan    1        1      1      0 

         Mary     Joe    Bob    Susan

使用这个矩阵，我想编译所有可能的友谊三角形的列表，条件是用户1是用户2的朋友，而用户2是用户3的朋友。对于我的列表，不需要用户1是用户3的朋友。

(joe, mary, bob)
(joe, mary, susan)
(bob, mary, susan)
(bob, joe, susan)

我有一些适用于小三角形的代码，但是我需要它来扩展非常大的稀疏矩阵。

from numpy import *
from scipy import *

def buildTriangles(G):
    # G is a sparse adjacency matrix
    start = time.time()
    ctr = 0
    G = G + G.T          # I do this to make sure it is symmetric
    triples = []
    for i in arange(G.shape[0] - 1):  # for each row but the last one
        J,J = G[i,:].nonzero()        # J: primary friends of user i
                                      # I do J,J because I do not care about the row values
        J = J[ J < i ]                # only computer the lower triangle to avoid repetition
        for j in J:
            K, buff = G[:,j].nonzero() # K: secondary friends of user i
            K = K[ K > i ]             # only compute below i to avoid repetition
            for k in K:
                ctr = ctr + 1
                triples.append( (i,j,k) )
    print("total number of triples: %d" % ctr)
    print("run time is %.2f" % (time.time() - start())
    return triples

我能够在大约21分钟内在csr_matrix上运行代码。矩阵为1032570 x 1032570，包含88910个存储元素。共产生了2178893个三胞胎。

我需要能够用1968654 x 1968654稀疏矩阵和9428596存储元素做类似的事情。

我是python的新手（不到一个月的经验），而不是线性代数中最好的，这就是我的代码没有利用矩阵运算的原因。任何人都可以提出任何改进建议，或者让我知道我的目标是否真实可行？

Answer 1

我认为你只能在行或列中找到三角形。例如：

Susan    1        1      1      0 
        Mary     Joe    Bob    Susan

这意味着玛丽，乔，鲍勃都是苏珊的朋友，因此，使用组合从[玛丽，乔，鲍勃]中选择两个人，并将其与苏珊合并将获得一个三角形。 itertools.combinations（）快速完成。

以下是代码：

import itertools
import numpy as np

G = np.array(   # clear half of the matrix first
    [[0,0,0,0],
     [1,0,0,0],
     [1,1,0,0],
     [1,1,1,0]])
triples = []     
for i in xrange(G.shape[0]):
    row = G[i,:]
    J = np.nonzero(row)[0].tolist() # combinations() with list is faster than NumPy array.
    for t1,t2 in itertools.combinations(J, 2):
        triples.append((i,t1,t2))
print triples

Answer 2

以下是一些优化建议：

K = K[ K > i ]             # only compute below i to avoid repetition
for k in K:
    ctr = ctr + 1
    triples.append( (i,j,k) )

不要在循环中递增，它非常慢。只需ctr += K.shape[0]即可。然后，通过将append替换为

，完全消除最深层嵌套的循环

triples += ((i, j, k) for k in K[K > i])

现在，如果你想在此任务上获得真正的性能，你将不得不进入一些线性代数。 “我想编译所有可能的友谊三角形的列表”意味着您想要对邻接矩阵进行平方，您可以使用简单的**2来完成。

然后意识到1.968.654²意味着一个非常大的矩阵，即使它非常稀疏，它的方块也会少得多，并且需要大量的内存。（我曾经解决过一个类似的问题，我考虑了距离为2的维基百科文章之间的链接，需要20分钟才能解决，在超级计算机群集节点上，在C ++中。这是这不是一个微不足道的问题。维基百科的邻接矩阵虽然密集了几个数量级。）

Python，Scipy：使用大邻接矩阵构建三元组

2 个答案: