删除子图GraphX

时间:2016-01-31 13:24:31

标签: apache-spark spark-graphx

我有以下图表:

// Vertices
val usersTest: RDD[(VertexId, (String))] = sc.parallelize(Array((1L, ("AAA")), (2L, ("BBB")), (3L, ("CCC"))))
// Edges
val relationshipsTest: RDD[Edge[Int]] = sc.parallelize(Array(Edge(1L, 3L, 1),Edge(1L, 3L, 1),Edge(1L, 2L, 3), Edge(2L, 1L, 1), Edge(2L, 1L, 2), Edge(2L, 3L, 1),   Edge(3L, 2L, 2)))
val defaultUserTest =  "Missing"
//Creating the Graph
val graphTest = Graph(usersTest, relationshipsTest, defaultUserTest)

产生以下输出:

(graphTest.numEdges, graphTest.numVertices)
res: (Long, Long) = (7,3)

现在,当我尝试使用子图时:

val validGraphTest = graphTest.subgraph(epred = e => e.attr > 2) 

我获得:

( validGraphTest.numEdges, validGraphTest.numVertices)
res: (Long, Long) = (1,3)

我想要的是删除未连接的顶点(例如,在示例中,因为我只剩下一条边,所需的输出将是res:(Long, Long) = (1,2)

我试过了

val validCCGraphTest = validGraphTest.connectedComponents()

但是( validCCGraphTest.numEdges, validCCGraphTest.numVertices)

仍然会产生res: (Long, Long) = (1,3)

1 个答案:

答案 0 :(得分:2)

零度的孤立顶点是大小为1的连通分量。这就是为什么你的方法不起作用的原因。你可以尝试这样的事情:

validGraphTest
  .outerJoinVertices(validGraphTest.degrees){
    case (_, vd, Some(x)) => (vd, x)
    case (_, vd, _) => (vd, 0)
  }
  .subgraph(vpred = {case (_, (_, x)) => x > 0})
  .mapVertices{case (_, (x, _)) => x}

或更简洁一点(虽然看起来效率较低):

Graph(validGraphTest.degrees, validGraphTest.edges).mask(graphTest)