Question

我之前曾问过类似的问题，但据我所知，尚不清楚。我有一个10000顶点和5000000边附近的无向且不加权的图，我将它们读入python作为边列表。

在我的工作中，我试图从每个边上构建一个函数，该函数取决于每个边上顶点相邻点之间的距离。假设我们有两个相连的顶点v1，v2表示一条边，对于v1，有n1个相连的邻居，并且还有n2个与v2连接的邻居。为了构建函数，我需要获取n1和n2邻居之间的距离。对于图中的所有边，该函数如下所示：

e_1*d_1 +e_1*d_2 +...+e_1*d_n+...e_m*d_1+...e_m*d_n

其中n是每个边缘上两个顶点的邻居数，d_n是两个顶点之间的距离，m是图中的边数，e_m是该图中的最后一条边。

通常，如果我们想获得最短的路径长度，则可以考虑像Dijkstra的Algorithm或Bfs那样进行图遍历，尤其是要对我的图进行加权。我使用了已经在networkx和igraph等程序包中编写的许多函数，但是这些函数效率不高，并且会占用大量图表时间。例如，函数shortest_paths_dijkstra()花费大约6.9个小时来获取距离，因为我需要多次调用它。此外，功能all_pairs_shortest_path _length大约需要13分钟（通过将称为截止的路径长度固定为3）和另外16分钟来调用图形中每对邻居的距离！

如问题中所述，我们需要获取v1，v2的邻居之间的距离，因此由于v1，v2已连接，因此最大距离为3。我觉得有一种更聪明的解决方案，它可以利用以下事实来减少时间复杂度：（在我的情况下）路径的可能距离为0, 1, 2, 3，因为这样就不必遍历每个图的整个图形源和目标之间的路径！只是我在寻找一种聪明的方法来获取邻居之间的距离（不是任意两个顶点）！

我写了这段代码，但是它花费了很多时间，大约54分钟，所以效率也不高！

neighbor1 = {}
neighbor2 = {}
distance = {}
for i in list(edges.values()):
  list_of_distances = []
  neighbor1 = tuple(graph.vs[graph.neighbors(i[0], mode="all")]["name"])
  neighbor2 = tuple(graph.vs[graph.neighbors(i[1], mode="all")]["name"])
  for n2 in neighbor2:
    for n1 in neighbor1:
       if n1 == n2:
            list_of_distances.append(0)
       elif (n1 != n2) and not graph.are_connected(n1,n2):
            if ( graph.are_connected(i[0],n2) ) or ( graph.are_connected(n1,i[1])  ): 
               list_of_distances.append(2)
            elif ( not graph.are_connected(i[0],n2)  ) or ( not graph.are_connected(n1,i[1]) ):
               list_of_distances.append(3)
       else:
            list_of_distances.append(1)
  distance[tuple(i)] = list_of_distances

我想知道是否有另一种方法不需要大量的内存和时间来获得这些距离，或者是否可以修改像Bfs或Dijkstra之类的图遍历方法，因此没有必要每次迭代都搜索整个图，只是做一些局部的（如果可以说）。感谢您的帮助

Answer 1

您的任务非常繁琐，因此脚本运行几个小时是正常的。您可以尝试将其与CUDA或类似的东西并行化，也可以尝试构建大缓存（GB）。但是，如果您不愿意，我建议您不要使用networkx / igraph函数，因为它们对您来说非常慢。您无需运行1000000 DFS就可以解决问题。这是使用Python集的一种可能的解决方案（我认为它会比您的更快，也许不是非常快）。

import networkx as nx

# Create a graph like yours
G = nx.fast_gnp_random_graph(1000, 0.05)

# Create neighbours dict
G_adj = dict(G.adjacency())
nbrs_dict = {node: {n for n in G_adj[node]} for node in G_adj}

# Result dict
distances = {}

# For every edge:
for e in G.edges:

    # Start value
    dist_value = 0

    # Get N1 and N2 neighbours
    n1_nbrs = nbrs_dict[e[0]]
    n2_nbrs = nbrs_dict[e[1]]

    # Triangles - nodes that connected to both N1 and N2
    # Every triangle value = 0
    tri = n1_nbrs & n2_nbrs
    for t in tri:

        # Get neighbours to find nodes with distance length = 2
        t_nbrs = nbrs_dict[t]

        t_in_n1 = n1_nbrs & t_nbrs
        t_in_n2 = n2_nbrs & t_nbrs

        t_not_in_n1 = n1_nbrs - t_nbrs
        t_not_in_n2 = n2_nbrs - t_nbrs

        dist_value += len(t_in_n1) + len(t_in_n2)
        dist_value += (2 * len(t_not_in_n1)) + (2 * len(t_not_in_n2))

    # Exclude all triangle nodes because we processed them all
    n1nt_nbrs = n1_nbrs - tri
    n2nt_nbrs = n2_nbrs - tri

    # Select squares - nodes with length = 1
    direct = set([])
    for n1 in n1nt_nbrs:
        nbrs = nbrs_dict[n1]
        d = nbrs & n2nt_nbrs
        for node in d:
            direct.add((n1, node))
    dist_value += len(direct)

    # Exclude squares so we have only nodes with length = 3
    n1final = n1nt_nbrs - set(e[0] for e in direct)
    n2final = n2nt_nbrs - set(e[1] for e in direct)
    dist_value += 3 * len(n1final) * len(n2final)

    # Distance for an edge
    distances[e] = dist_value

无论如何，您的问题具有O(n^3)复杂性，因此强烈建议您尝试拆分图表。也许您有bridges或只有几个连接的组件。如果将它们分别处理，则将大大提高速度。

连接的顶点的邻居之间的最短路径长度（不能是任何两个随机顶点！）

1 个答案: