意外的消费者失败/再平衡

时间:2019-01-14 14:05:38

标签: apache-kafka spring-kafka

使用Apache Kafka 2.1.0和spring-kafka 2.1.7,我们在spring-kafka消费者客户端上收到如下错误消息:

2019-01-13 23:01:34.019 consumer-1-C-1 LogContext$KafkaLogger.error SEVERE: [Consumer clientId=consumer-2, groupId=kafka-consumer-group-x] Offset commit failed on partition topic-x-16 at offset 57882: The coordinator is not aware of this member.

此错误发生前几秒钟,我们可以在其中一个kafka钻机上看到以下日志消息:

[2019-01-13 23:01:17,329] INFO [GroupCoordinator 2]: Member consumer-30-13dc06ff-aed2-4e4e-a66d-2d60d79ac526 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:17,330] INFO [GroupCoordinator 2]: Preparing to rebalance group kafka-consumer-group-x in state PreparingRebalance with old generation 1370 (__consumer_offsets-40) (reason: removing member consumer-30-13dc06ff-aed2-4e4e-a66d-2d60d79ac526 on heartbeat expiration) (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:17,330] INFO [GroupCoordinator 2]: Member consumer-20-ba370e86-e1cc-4261-a73c-78cea1b00479 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:17,335] INFO [GroupCoordinator 2]: Member consumer-32-be8807df-b88f-4cc9-bddf-bed772d1244f in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:17,335] INFO [GroupCoordinator 2]: Member consumer-17-3e34f026-894e-40dc-916b-d169a43da135 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:17,335] INFO [GroupCoordinator 2]: Member consumer-31-4dd9cb6e-09e9-47db-9610-37e0ab5633e0 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:17,335] INFO [GroupCoordinator 2]: Member consumer-18-90175650-1224-4f22-9350-246e17e75367 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,332] INFO [GroupCoordinator 2]: Member consumer-19-663239af-9702-4e59-ad3d-f8202e9d579d in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,347] INFO [GroupCoordinator 2]: Member consumer-22-c54fb4c0-1fa1-4d9f-91fc-1da6df41b227 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,347] INFO [GroupCoordinator 2]: Member consumer-25-3bfd915c-8bd1-454b-85e3-60212b4c568e in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,347] INFO [GroupCoordinator 2]: Member consumer-27-cbb97ebf-b5cd-4cfa-991a-5302462ddab9 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,615] INFO [GroupCoordinator 2]: Member consumer-24-37fbcc73-e8c6-4820-ad56-580fd88f5a10 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,618] INFO [GroupCoordinator 2]: Member consumer-21-eea1b841-202e-4ebe-bdde-007775d001dd in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,636] INFO [GroupCoordinator 2]: Member consumer-28-881da47e-87c9-4675-9f88-e3b33748cff1 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,708] INFO [GroupCoordinator 2]: Member consumer-26-375880ee-b2a9-4ece-8eee-987d282956d8 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,708] INFO [GroupCoordinator 2]: Member consumer-23-492417e9-f3cb-4bec-bbac-130895356907 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,731] INFO [GroupCoordinator 2]: Member consumer-29-64732e9a-2c2b-44fb-a8a5-f606462a4201 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,947] INFO [GroupCoordinator 2]: Member consumer-10-fdd0ca92-3604-46de-9e2b-97ca41d36150 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,228] INFO [GroupCoordinator 2]: Member consumer-3-feb6986d-79af-4c64-a8f8-2dbb3bdb73c3 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,257] INFO [GroupCoordinator 2]: Member consumer-2-0345e5d5-86fc-4df0-bd39-c35b75514cea in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,257] INFO [GroupCoordinator 2]: Member consumer-1-c301f59f-8a56-4bdb-a5ef-dc163232d378 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,257] INFO [GroupCoordinator 2]: Member consumer-13-56aea64a-ecca-45e7-9474-b8f1163d01c8 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,266] INFO [GroupCoordinator 2]: Member consumer-9-3ee76e0e-86f1-4c0c-85cc-d07721bf36b1 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,273] INFO [GroupCoordinator 2]: Member consumer-4-9fa81414-870d-444d-b5d1-c38ce5c157a8 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,296] INFO [GroupCoordinator 2]: Member consumer-14-8236578f-b60d-4199-b621-913d025149d1 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,656] INFO [GroupCoordinator 2]: Member consumer-12-2921b7de-1721-460f-adbf-4fb6951cca22 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,665] INFO [GroupCoordinator 2]: Member consumer-11-09d7015c-cc33-464e-93ac-fb270f209b3f in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,667] INFO [GroupCoordinator 2]: Member consumer-5-b3fe06ff-8ef4-4d60-8571-68b7cfee12bc in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,722] INFO [GroupCoordinator 2]: Member consumer-15-5af82ca6-0ebf-463e-b9c5-4bbde513453d in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,754] INFO [GroupCoordinator 2]: Member consumer-7-c1e2bf89-c7c5-4363-b099-191956ed1c89 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,848] INFO [GroupCoordinator 2]: Member consumer-6-9b3be0e4-c1be-4d6a-98b1-caa9d095c403 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,848] INFO [GroupCoordinator 2]: Member consumer-16-0f48ad44-402a-4706-9d78-9d0d5077a56d in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,848] INFO [GroupCoordinator 2]: Member consumer-8-0496aa54-79f7-41b8-8f31-7823ed72f16a in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,848] INFO [GroupCoordinator 2]: Group kafka-consumer-group-x with generation 1371 is now empty (__consumer_offsets-40) (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:35,226] INFO [GroupCoordinator 2]: Preparing to rebalance group kafka-consumer-group-x in state PreparingRebalance with old generation 1371 (__consumer_offsets-40) (reason: Adding new member consumer-1-7787a334-acf2-4534-bc19-78af35371bfb) (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:38,227] INFO [GroupCoordinator 2]: Stabilized group kafka-consumer-group-x generation 1372 (__consumer_offsets-40) (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:38,239] INFO [GroupCoordinator 2]: Assignment received from leader for group kafka-consumer-group-x for generation 1372 (kafka.coordinator.group.GroupCoordinator)

由于在处理消息或过程需要花费大量时间的迹象时看不到任何错误,因此我们无法解释这些突然的重新平衡。

有人暗示这可能来自哪里吗?

针对我们的消费者的配置大多默认为enable.auto.commit=falseAckMode.RECORD

2 个答案:

答案 0 :(得分:0)

我很确定您会遇到KAFKA-7196。 您应该将服务器升级到2.0.1 or later

作为一种解决方法,您可以尝试在每次启动时配置一个随机的client.id,但这会产生一些不良的副作用。

答案 1 :(得分:0)

在卡夫卡重新平衡的原因:

  1. 一个新的消费者加入了该群体
  2. 一个消费者离开了小组(彻底关闭)
  3. 已添加新分区
  4. 在卡夫卡看来,一个消费者似乎已经死了

第四名的原因:

  1. 消费者无法加入max.poll.interval.ms(长时间运行的过程)
  2. 在session.timeout.ms中,消费者无法向Kafka发送心跳

**通常,心跳线程在每个heartbeat.interval.ms中运行(默认3秒)

您的情景似乎是4.2。

可能有多种原因。要解决该问题,您可以增加session.timeout.ms。 (默认值为10秒。)

另一种解决方案是优化系统以按预期运行心跳线程。 (避免高IOWait,负载平衡等)

相关问题