GraphSAGE:https://github.com/williamleif/GraphSAGE/issues/144
在大图(约5百万个节点)上通过给定方法生成随机游动花费的时间太长。 有人可以分享一个更好的优化版本(分布式/并行化)吗?
给出的代码:
nodes = G.nodes()
pairs = []
for count, node in enumerate(nodes):
if G.degree(node) == 0:
continue
for i in range(num_walks):
curr_node = node
for j in range(5):
next_node = random.choice(list(G.neighbors(curr_node)))
if curr_node != node:
pairs.append((node,curr_node))
curr_node = next_node
if count % 1000 == 0:
print("Done walks for", count, "nodes")
return pairs
我需要将以上内容转换为可并行运行的等效内容。