Question

以下是我正在为流应用程序制作的错误处理程序的Scala代码段。它使用akka流来消费Kafka主题中的消息（'errormsg'）并将它们写入Kudu中的表。

val kafkaMessages: Source[ConsumerMessage.CommittableMessage[String, Array[Byte]], Consumer.Control] = Consumer.committableSource(
    consumerSettings,
    Subscriptions.topics(conf.getString("kafka.topics.errorRawCdr")))

  val cdrs: Source[Errors, Consumer.Control] = kafkaMessages.map(msg => {
    val bytes: Array[Byte] = msg.record.value()
    val errormsg = (bytes.map(_.toChar)).mkString
    new Errors(1235, "filename", "cdr", "cdr_type", 0, errormsg)
  })

  cdrs.to(new ErrorKuduSink(session, table)).run()

我想进一步重复使用变量' errormsg '，作为向我发送该消息的几行代码的一部分。

如何逃避' errormsg '（或者合并下面的代码段），以便变量范围合适？

  send a new Mail (
    from = ("errorhandler@domain.com"),
    to = "myemailadres@domain.com",
    subject = "Encountered error",
    message = errormsg
  )

Answer 1

解决方案1：在地图方法中发送电子邮件（将在每条kafka消息上发送电子邮件）

def sendEmail(errormsg: String): Unit = ???

val cdrs: Source[Errors, Consumer.Control] = 
  kafkaMessages.map { msg => 
    val bytes: Array[Byte] = msg.record.value()
    val errormsg = (bytes.map(_.toChar)).mkString
    sendEmail(errormsg) // call function that sends email
    new Errors(1235, "filename", "cdr", "cdr_type", 0, errormsg)
  }

解决方案2：如果您希望在下游阶段更复杂地使用errormsg，则需要从地图阶段返回一个元组：

val kafkaMessages: Source[ConsumerMessage.CommittableMessage[String, Array[Byte]], Consumer.Control] = 
  Consumer.committableSource(consumerSettings, Subscriptions.topics(conf.getString("kafka.topics.errorRawCdr")))

val cdrs: Source[Errors, Consumer.Control] = 
  kafkaMessages.map { msg => 
    val bytes: Array[Byte] = msg.record.value()
    val errormsg = (bytes.map(_.toChar)).mkString
    (new Errors(1235, "filename", "cdr", "cdr_type", 0, errormsg), errormsg) // we are returning a tuple so type of downstream elements will be (Errors, String)
  }.map { case i@(errors, errormsg) => 
    sendEmail(errormsg)
    i
  }.map { tuple =>
    ...
  }.map(_._1) // as we dont need a tuple any more we can get original element and continue processing of it


cdrs.to(new ErrorKuduSink(session, table)).run()

解决方案3：如果您想要更复杂的处理（例如，在一封电子邮件中批量处理多个errormsg），您可能需要创建RunnableGraph

val g = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
  import GraphDSL.Implicits._
  val in = Consumer.committableSource(consumerSettings, Subscriptions.topics(conf.getString("kafka.topics.errorRawCdr")))
    .map { msg => 
      val bytes: Array[Byte] = msg.record.value()
      val errormsg = (bytes.map(_.toChar)).mkString
      (new Errors(1235, "filename", "cdr", "cdr_type", 0, errormsg), errormsg)
    }
  val kuduout = new ErrorKuduSink(session, table)
  val emailout = Sink.foreach[Seq[String]] { errormsgs =>
    sendEmail(errormsgs)
  }
  val f1 = Flow[(Errors, String)]
    .map(_._1) // take errors

  val f2 = Flow[(Errors, String)]
    .map(_._2) // take errormsgs
    .groupedWithin(100, 1.hour)

  val bcast = builder.add(Broadcast[Int](2))

  in ~> bcast
  bcast ~> f1 ~> kuduout
  bcast ~> f2 ~> emailout 
  ClosedShape
})

Answer 2

在此，我建议使用MutableList：

轻松解决您的问题

val kafkaMessages: Source[ConsumerMessage.CommittableMessage[String, Array[Byte]], Consumer.Control] = Consumer.committableSource(
    consumerSettings,
    Subscriptions.topics(conf.getString("kafka.topics.errorRawCdr")))

    import scala.collection.mutable._
    val errorMessages: MutableList[String] = new MutableList

  val cdrs: Source[Errors, Consumer.Control] = kafkaMessages.map(msg => {
    val bytes: Array[Byte] = msg.record.value()
    val errormsg = (bytes.map(_.toChar)).mkString
    errorMessages += errormsg
    new Errors(1235, "filename", "cdr", "cdr_type", 0, errormsg)
  })

  cdrs.to(new ErrorKuduSink(session, table)).run()

更改scala局部变量范围

2 个答案: