dstream [Double] to dstream scala

时间:2017-12-04 14:39:10

标签: scala apache-spark apache-kafka spark-streaming dstream

我正在开发消费者应用程序,它正在消费来自kafka经纪人的消息,我想找到消息的平均消息,并且最终我想将这个平均值存储到cassandra中。

val Array(brokers, topics) = args

val sparkConf = new SparkConf().setAppName("MyDirectKafkaWordCount").setMaster("local[2]")
val ssc = new StreamingContext(sparkConf, Seconds(20))

val topicsSet = topics.split(",").toSet
val kafkaParams = Map[String, String]("metadata.broker.list" -> brokers)
val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
  ssc, kafkaParams, topicsSet)

val lines = messages.map(_._2)
val count = lines.count()
count.print()
val total = lines.map(JSON.parseFull(_)
    .asInstanceOf[Option[List[Map[String,List[Map[String,Double]]]]]]
    .map(_(1)("MeterData").map(_("mVolts1"))).getOrElse(List())).flatMap(list => list).reduce((x,y) => x+y)
total.print()

val avg = total.reduce((total,count) => total / count )
avg.print()
ssc.start()
ssc.awaitTermination()

在上面的代码中,我得到了总数并且完全按照我的预期计算,但我无法计算平均值,因为计数是dstream [long]而total是dstream [double]。

我认为这条线存在一些问题。 “val avg = total.reduce((总计,计数)=>总计/计数)” 任何帮助表示赞赏。

输出: 计数: This is the count output i get in stream as dstream[Long] 总: This is the total output i get in same stream as dstream[Double]

0 个答案:

没有答案