Question

请建议在调用func_1的地方编写内联函数的最佳方法。它还应该执行func_1正在尝试做的事情 （我知道函数无法在Scala中返回两件事）

我正在从文件（args（0））中读取行，其中每行包含用逗号分隔的数字。对于每行，第一个数字 nodeId 和其他数字是其邻居对于前5行，第一个数字本身是 cluseterId 。图包含每个具有 Long：nodeId，Long：clusterId和List [Long]：neighbours

的节点

我正在尝试编写一种地图简化功能，其中该功能“ func_1” 就像一个映射器，该映射器发出（nodeId，clusterId，neighbours），然后检查邻居中的每个元素以及是否然后，clusterId> -1发出（nodeId，clusterId）。简而言之，元组（nodeId，clusterId，neighbours）必须无条件发出

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import scala.collection.mutable.ListBuffer

object Partition {

  val depth = 6

  def func_1(nodeId:Long,clusterId:Long,neightbours:List[Long]):Either[(Long,Long,List[Long]),(Long,Long)]={
    Left(nodeId,clusterId,neightbours)
    for(x <- neightbours){
      if(clusterId > -1){
       Right(x,clusterId)
      }
    }
  }
  def func_2(){

  }
  def main ( args: Array[ String ] ) {
    val conf=new SparkConf().setAppName("Partition")
    val sc=new SparkContext(conf)
    var count : Int = 0

    var graph=sc.textFile(args(0)).map(line =>{
                                         var nodeId:Long=line(0).toLong
                                         var clusterId:Long=1
                                         var neighbours=new ListBuffer[Long]()
                                         if(count < 5){
                                           clusterId=line(0).toLong
                                         }else{
                                           clusterId= -1 * clusterId
                                         }
                                         val nums=line.split(",")
                                         for(i <- 1 to line.length()-1){
                                           neighbours.+=(nums(i).toLong)
                                         }
                                         (nodeId,clusterId,neighbours.toList)
                                         }).collect()
    graph.foreach(println)
    for (i <- 1 to depth)
      graph = graph.flatMap{ func_1 }.groupByKey.map{ /* (2) */ }
    /* finally, print partition sizes */

  }
}

Answer 1

弄清楚您想要什么真的很困难，因为您的代码绝对没有道理。

我将大胆猜测您可能正在寻找类似的东西。

Null

至少可以编译，这是一个开始的地方。

def func_1(nodeId      :Long
          ,clusterId   :Long
          ,neightbours :List[Long]
          ) :Either[(Long,Long,List[Long]),List[(Long,Long)]] =

  if (clusterId > -1) Right(neightbours.map(_ -> clusterId))
  else                 Left(nodeId, clusterId, neightbours)

请在此处建议编写匿名函数的最佳方法

1 个答案: