Question

我写了类似DFS的算法来查找从零级开始的所有可能路径。由于有2,000个节点和5,000个边缘，因此下面的代码执行非常慢。对这个算法有什么建议吗？

    all_path = []

    def printAllPathsUntil(s, path):
        path.append(s)
        if s not in adj or len(adj[s]) <= 0:
            all_path.append(path[:]) # EDIT2
        else:
            for i in adj[s]:
                printAllPathsUntil(i, path)
        path.pop()

    for point in points_in_start:
        path = []
        printAllPathsUntil(point, path)

adj占据边；起始位置为键，目标列表为值。

    points_in_start = [0, 3, 7]
    adj = {0: [1, 8],
           1: [2, 5],
           2: [],
           3: [2, 4],
           4: [],
           5: [6],
           6: [],
           7: [6],
           8: [2]
           }

EDIT1

这是DAG。没有周期。

enter image description here

Answer 1

您的算法存在的问题是它将重复很多工作。在您的示例中，情况并非如此，因为只有一个节点被另外两个节点到达时，它是一个叶节点，例如C，但是对从D到{{1 }}：这意味着将再次访问从B开始的整个子图！对于具有2000个节点的图，这将导致速度显着下降。

要解决此问题，您可以使用记忆，但是这意味着您必须重新构造算法，而不是添加到现有的B并将path添加到path，它必须all_paths从当前节点开始的（部分）路径，并将这些路径与父节点合并为完整路径。然后，当您再次访问return来自另一个节点时，可以使用functools.lru_cache重用所有这些部分结果。

Answer 2

正如评论和其他答案中已经指出的那样，记住先前访问的节点的下游路径是一个优化领域。

这是我要实现的尝试。

这里，downstream_paths是一本字典，我们在其中记住每个访问的非叶节点的下游路径。

我已经在最后提到了一个包含一个小的“重新访问的非叶子”案例的小测试案例的%%timeit结果。由于我的测试用例只有一个重新访问非叶子节点的情况，因此改进仅是适度的。也许在您的大规模数据集中，性能会有更大的差距。

输入数据：

points_in_start = [0, 3, 7]
adj = {0: [1, 8],
       1: [2, 5],
       2: [],
       3: [2, 4],
       4: [],
       5: [6],
       6: [],
       7: [6],
       8: [2],     # Non-leaf node "2" is a child of both "8" and "3"
       
       2:[10],
       
       10:[11,18],
       11:[12,15],
       12:[],
       15:[16],
       16:[],
       18:[12]
      }

修改后的代码：

%%timeit

downstream_paths = {}                                 # Maps each node to its
                                                      # list of downstream paths
                                                      # starting with that node.

def getPathsToLeafsFrom(s):      # Returns list of downstream paths starting from s
                                 # and ending in some leaf node.
    children = adj.get(s, [])
    if not children:                                  # s is a Leaf
        paths_from_s = [[s]]
    else:                                             # s is a Non-leaf
        ds_paths = downstream_paths.get(s, [])        # Check if s was previously visited
        if ds_paths:                                  # If s was previously visited.
            paths_from_s = ds_paths
        else:                                         # s was not visited earlier.
            paths_from_s = []                         # Initialize
            for child in children:
                paths_from_child = getPathsToLeafsFrom(child)   # Recurse for each child
                for p in paths_from_child:
                    paths_from_s.append([s] + p)
            downstream_paths[s] = paths_from_s       # Cache this, to use when s is re-visited
    return paths_from_s

path = []
for point in points_in_start:
    path.extend(getPathsToLeafsFrom(point))

输出：

from pprint import pprint
pprint (all_path)

[[0, 1, 2, 10, 11, 12],
 [0, 1, 2, 10, 11, 15, 16],
 [0, 1, 2, 10, 18, 12],
 [0, 1, 5, 6],
 [0, 8, 2, 10, 11, 12],
 [0, 8, 2, 10, 11, 15, 16],
 [0, 8, 2, 10, 18, 12],
 [3, 2, 10, 11, 12],
 [3, 2, 10, 11, 15, 16],
 [3, 2, 10, 18, 12],
 [3, 4],
 [7, 6]]

计时结果：原始发布的代码：

10000次循环，最佳3：每个循环63 µs

计时结果：优化代码：

10000次循环，最佳3：每个循环43.2 µs

DFS查找所有可能的路径非常慢

EDIT1

2 个答案: