Spark Kafka stream do not set kafka consumer config

时间:2018-07-25 08:16:55

标签: apache-spark spark-streaming

when i try to run this code

Map<String, Object> kafkaParams = new HashMap<>();
kafkaParams.put("bootstrap.servers", "localhost:9092");
kafkaParams.put("key.deserializer", StringDeserializer.class);
kafkaParams.put("value.deserializer", StringDeserializer.class);
kafkaParams.put("group.id", "use_a_separate_group_id_for_each_stream");
kafkaParams.put("auto.offset.reset", "latest");
kafkaParams.put("enable.auto.commit", false);

Collection<String> topics = Arrays.asList("topicA", "topicB");

JavaInputDStream<ConsumerRecord<String, String>> stream =
        KafkaUtils.createDirectStream(
                ssc,
                LocationStrategies.PreferConsistent(),
                ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams)
        );


stream.mapToPair(record -> new Tuple2<>(record.key(), record.value()));

i always have the message:

2018-07-25 11:10:26 WARN  KafkaUtils:66 - overriding auto.offset.reset to none for executor

i analyze the code and notice that the method fixKafkaParams will always run. How to solve this problem

1 个答案:

答案 0 :(得分:0)

您应该自己管理偏移量。如果Spark管理偏移量,则它将覆盖此值。因为如果auto.offset.reset设置为最新,则Spark Job会尝试对每个批次执行此操作。但是相反,Spark必须从offset读取消息。我们无法更改。