Question

我想找出有向图中给定顶点的可到达顶点的数量（见下图），例如对于id = 0L，由于它连接到1L和2L，1L连接到3L，2L连接到4L，因此，输出应为4.以下是图形关系数据：

edgeid from to distance
0 0 1 10.0
1 0 2 5.0
2 1 2 2.0
3 1 3 1.0
4 2 1 3.0
5 2 3 9.0
6 2 4 2.0
7 3 4 4.0
8 4 0 7.0
9 4 3 5.0

我能够设置图表，但我不知道如何使用graph.edges.filter获取输出

val vertexRDD: RDD[(Long, (Double))] = sc.parallelize(vertexArray)
val edgeRDD: RDD[Edge[Int]] = sc.parallelize(edgeArray)
val graph: Graph[(Double), Int] = Graph(vertexRDD, edgeRDD)

Answer 1

在您的示例中，所有顶点都与有向路径连接，因此每个顶点的值应为4.

但如果您要删除4＆gt; 0 （id = 8）连接，那么会有不同的数量。

由于你的问题依赖于（递归）并行遍历图形，Graphx Pregel API可能是最好的方法。

pregel调用需要3个函数

vprog使用消息初始化每个顶点（在您的情况下为空List[VertexId]）
sendMsg一个更新步骤，适用于每次迭代（在您的情况下累积相邻的VertexId并返回Iterator并发送消息以发送到下一次迭代
mergeMsg合并两条消息（2 List[VertexId] s为1）

在代码中它看起来像：

  def vprog(id: VertexId, orig: List[VertexId], newly: List[VertexId]) : List[VertexId] = newly

  def mergeMsg(a: List[VertexId], b: List[VertexId]) : List[VertexId] = (a ++ b).distinct

  def sendMsg(trip: EdgeTriplet[List[VertexId],Double]) : Iterator[(VertexId, List[VertexId])] = {
    val recursivelyConnectedNeighbors = (trip.dstId :: trip.dstAttr).filterNot(_ == trip.srcId)

    if (trip.srcAttr.intersect(recursivelyConnectedNeighbors).length != recursivelyConnectedNeighbors.length)
      Iterator((trip.srcId, recursivelyConnectedNeighbors))
    else
      Iterator.empty
  }

  val initList = List.empty[VertexId]

  val result = graph
    .mapVertices((_,_) => initList)
    .pregel(
      initialMsg = initList,
      activeDirection = EdgeDirection.Out
    )(vprog, sendMsg, mergeMsg)
    .mapVertices((_, neighbors) => neighbors.length)

  result.vertices.toDF("vertex", "value").show()

输出：

+------+-----+
|vertex|value|
+------+-----+
|     0|    4|
|     1|    3|
|     2|    3|
|     3|    1|
|     4|    1|
+------+-----+

如果您要获取OoM的遍历大图（或在pregel init中配置spark.graphx.pregel.checkpointInterval），请务必试用maxIterations

如何在Spark GraphX中找到可从给定顶点到达的顶点数

1 个答案: