Question

在下面的代码中我试图从kafka主题中读取avro消息，并且在map方法中，我使用KafkaAvroDecoder fromBytes方法，它似乎导致任务不可序列化异常，我如何解码avro消息？< / p>

public static void main（String [] args）抛出异常{

    Properties decoderProps = new Properties();
    decoderProps.put("schema.registry.url", SCHEMA_REG_URL);
    //decoderProps.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, "true");

    KafkaAvroDecoder decoder = new KafkaAvroDecoder(new VerifiableProperties(decoderProps));


    SparkSession spark = SparkSession
        .builder()
        .appName("JavaCount1").master("local[2]")
        .config("spark.driver.extraJavaOptions", "-Xss4M")
        .getOrCreate();

    Dataset<Row> ds1 = spark
        .readStream()
        .format("kafka")
        .option("kafka.bootstrap.servers", HOSTS)
        .option("subscribe", "systemDec200Message")
        .option("startingOffsets", "earliest")
        .option("maxOffsetsPerTrigger", 1)
        .load();



    Dataset<String> ds2 = ds1.map(m-> {
        GenericData.Record data = (GenericData.Record)decoder.fromBytes((byte[]) m.get(1));

        return "sddasdadasdsadas";
}, Encoders.STRING());





    StreamingQuery query = ds2.writeStream()
        .outputMode("append")
        .format("console")
        .trigger(ProcessingTime.apply(15))
        .start();

    query.awaitTermination();
}

我得到如下例外，

17/04/12 16:51:06 INFO CodeGenerator：代码生成于329.145119 ms 17/04/12 16:51:07错误StreamExecution：查询[id = 1d56386c-3fba-4978-8565-6b9c880d4fce，runId = b7bbb8d8-b52d-4c14-9dec bc9cb41f8d77-]终止，错误org.apache.spark.SparkException：在org.apache.spark：在org.apache.spark.util.ClosureCleaner $ .ensureSerializable（298 ClosureCleaner.scala）任务不可串行化.util.ClosureCleaner $ .org $ apache $ spark $ util $ ClosureCleaner $$ clean（ClosureCleaner.scala：288）atg.apache.spark.util.ClosureCleaner $ .clean（ClosureCleaner.scala：108）at org.apache。 spark.SparkContext.clean（SparkContext.scala：2094）atg.apache.spark.rdd.RDD $$ anonfun $ mapPartitionsWithIndex $ 1.apply（RDD.scala：840）at org.apache.spark.rdd.RDD $$ anonfun $ mapPartitionsWithIndex $ 1.适用（RDD.scala：839）在org.apache.spark.rdd.RDDOperationScope $ .withScope（RDDOperationScope.scala：151）在org.apache.spark.rdd.RDDOperationScope $ .withScope（RDDOperationScope.scala： 112）在org.apache.spark.rdd.RDD.withScope（RDD.scala：3 62）在org.apache.spark.rdd.RDD.mapPartitionsWithIndex（RDD.scala：839）在org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute（WholeStageCodegenExec.scala：371）在org.apache.spark.sql .execution.SparkPlan $$ anonfun $执行$ 1.apply（SparkPlan.scala：114）org.apache.spark.sql.execution.SparkPlan $$ anonfun $执行$ 1.apply（SparkPlan.scala：114）at org.apache .spark.sql.execution.SparkPlan $$ anonfun $ executeQuery $ 1.apply（SparkPlan.scala：135）at org.apache.spark.rdd.RDDOperationScope $ .withScope（RDDOperationScope.scala：151）

Answer 1

在lambda范围内（在地图调用中）移动KAFKA AVRO DECODER声明后，序列化问题消失了，但现在在运行时出现了另一个异常，

org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 116, Column 101: No applicable constructor/method found for actual parameters "long"; candidates are: "java.lang.Integer(int)", "java.lang.Integer(java.lang.String)"
    at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:10174)
    at org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:7559)
    at org.codehaus.janino.UnitCompiler.invokeConstructor(UnitCompiler.java:6505)
    at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4126)
    at org.codehaus.janino.UnitCompiler.access$7600(UnitCompiler.java:185)
    at org.codehaus.janino.UnitCompiler$10.visitNewClassInstance(UnitCompiler.java:3275)
    at org.codehaus.janino.Java$NewClassInstance.accept(Java.java:4085)
    at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
    at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368)
    at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3571)

spark structured streaming（java）：任务不可序列化

1 个答案: