Question

我正在使用Spark Structured Streaming，如上所述 this page

我从Kafka主题获得了正确的消息，但价值是Avro格式。有没有办法反序列化Avro记录（类似于targetV=INSERT_HERE; secondV=23 # oops: secondV accidnetally hidden: targetV="foobar; O'Reilly " # trailing blank important; secondV=23方法）？

Answer 1

Spark＆gt; = 2.4

您可以使用from_avro库中的spark-avro功能。

import org.apache.spark.sql.avro._

val schema: String = ???
df.withColumn("value", from_avro($"value", schema))

Spark＆lt; 2.4

定义一个带Array[Byte]（序列化对象）的函数：

import scala.reflect.runtime.universe.TypeTag def decode[T : TypeTag](bytes: Array[Byte]): T = ???

将反序列化Avro数据并创建可以存储在Dataset中的对象。

根据功能创建udf。

val decodeUdf = udf(decode _)

在udf
上致电value
val df = spark .readStream .format("kafka") ... .load() df.withColumn("value", decodeUdf($"value"))

Spark结构化流中的Avro格式反序列化

1 个答案: