节点故障后卡夫卡消费者错误

时间:2017-07-30 11:31:36

标签: apache-kafka apache-kafka-streams

我们在3实例kafka群集上测试节点故障情况,复制因子为2.

删除实例后,消费者经常失败。

消费者正在使用kafka流来阅读消息

由于

这是消费者失败日志:

08:09:19.667 [ComponentsActivityEventsStream-608b9f05-0911-4b14-a1b1-37247747686a-StreamThread-2] ERROR o.a.k.c.c.i.ConsumerCoordinator - User provided listener org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener for group ComponentsActivityEventsStream failed on partition assignment

org.apache.kafka.streams.errors.StreamsException: Store ComponentsActivityStore's change log (ComponentsActivityEventsStream-ComponentsActivityStore-changelog) does not contain partition 0

    at org.apache.kafka.streams.processor.internals.StoreChangelogReader.validatePartitionExists(StoreChangelogReader.java:87)

    at org.apache.kafka.streams.processor.internals.ProcessorStateManager.register(ProcessorStateManager.java:165)

    at org.apache.kafka.streams.processor.internals.AbstractProcessorContext.register(AbstractProcessorContext.java:100)

    at org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStore.init(RocksDBSegmentedBytesStore.java:110)

...
08:09:19.673 [ComponentsActivityEventsStream-608b9f05-0911-4b14-a1b1-37247747686a-StreamThread-2] INFO  o.a.k.s.p.i.StreamThread - stream-thread [ComponentsActivityEventsStream-608b9f05-0911-4b14-a1b1-37247747686a-StreamThread-2] Shutting down

08:09:19.674 [ComponentsActivityEventsStream-608b9f05-0911-4b14-a1b1-37247747686a-StreamThread-2] INFO  o.a.k.c.p.KafkaProducer - Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.

08:09:19.681 [ComponentsActivityEventsStream-608b9f05-0911-4b14-a1b1-37247747686a-StreamThread-2] INFO  o.a.k.s.p.i.StreamThread - stream-thread [ComponentsActivityEventsStream-608b9f05-0911-4b14-a1b1-37247747686a-StreamThread-2] Removing all active tasks []

08:09:19.681 [ComponentsActivityEventsStream-608b9f05-0911-4b14-a1b1-37247747686a-StreamThread-2] INFO  o.a.k.s.p.i.StreamThread - stream-thread [ComponentsActivityEventsStream-608b9f05-0911-4b14-a1b1-37247747686a-StreamThread-2] Removing all standby tasks []

08:09:19.681 [ComponentsActivityEventsStream-608b9f05-0911-4b14-a1b1-37247747686a-StreamThread-2] INFO  o.a.k.s.p.i.StreamThread - stream-thread [ComponentsActivityEventsStream-608b9f05-0911-4b14-a1b1-37247747686a-StreamThread-2] Removing all standby tasks []

08:09:19.681 [ComponentsActivityEventsStream-608b9f05-0911-4b14-a1b1-37247747686a-StreamThread-2] INFO  o.a.k.s.p.i.StreamThread - stream-thread [ComponentsActivityEventsStream-608b9f05-0911-4b14-a1b1-37247747686a-StreamThread-2] Stream thread shutdown complete

08:09:19.681 [ComponentsActivityEventsStream-608b9f05-0911-4b14-a1b1-37247747686a-StreamThread-2] WARN  o.a.k.s.p.i.StreamThread - stream-thread [ComponentsActivityEventsStream-608b9f05-0911-4b14-a1b1-37247747686a-StreamThread-2] Unexpected state transition from ASSIGNING_PARTITIONS to DEAD.

08:09:19.681 [ComponentsActivityEventsStream-608b9f05-0911-4b14-a1b1-37247747686a-StreamThread-2] WARN  i.a.k.BaseEventsStream - uncaught exception in stream thread ComponentsActivityEventsStream

org.apache.kafka.streams.errors.StreamsException: stream-thread [ComponentsActivityEventsStream-608b9f05-0911-4b14-a1b1-37247747686a-StreamThread-2] Failed to rebalance.

    at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:589)

    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)

    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)

Caused by: org.apache.kafka.streams.errors.StreamsException: Store ComponentsActivityStore's change log (ComponentsActivityEventsStream-ComponentsActivityStore-changelog) does not contain partition 0

    at org.apache.kafka.streams.processor.internals.StoreChangelogReader.validatePartitionExists(StoreChangelogReader.java:87)

    at org.apache.kafka.streams.processor.internals.ProcessorStateManager.register(ProcessorStateManager.java:165)

1 个答案:

答案 0 :(得分:1)

检查引导服务器列表中是否列出了所有三个代理。如果您只列出一个代理并且它恰好是那个代理,那么消费者就无法获得所需的元数据,以了解哪个节点是您的每个分区的领导者话题。如果它无法获得有效的元数据响应,那么即使它们具有活动的数据副本,它也不会连接到其他两个代理中的任何一个。