Kafka Streams应用程序无休止的重新平衡

时间:2020-04-13 10:25:01

标签: java apache-kafka apache-kafka-streams

我们正在运行kafka流应用程序,但遇到了一个奇怪的问题。我们同时使用全局状态存储和其他多个状态存储。

我们的应用程序已加载所有数据,并且状态存储中现在包含大量信息。现在,当我们尝试关闭应用程序并将其重新带回(某些配置更改)时,它将进入无休止的重新平衡过程。为了验证我们是否还原了配置更改,但仍停留在该阶段。没有错误等等

INFO  o.apache.kafka.streams.KafkaStreams - stream-client [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb] Started Streams client
INFO  o.a.k.s.p.internals.StreamThread - stream-thread [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-2] State transition from RUNNING to PARTITIONS_REVOKED
INFO  o.apache.kafka.streams.KafkaStreams - stream-client [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb] State transition from RUNNING to REBALANCING
INFO  o.a.k.s.p.internals.StreamThread - stream-thread [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-2] partition revocation took 1 ms.
    suspended active tasks: []
    suspended standby tasks: []
INFO  o.a.k.s.p.internals.StreamThread - stream-thread [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-1] State transition from RUNNING to PARTITIONS_REVOKED
INFO  o.a.k.s.p.internals.StreamThread - stream-thread [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-1] partition revocation took 0 ms.
    suspended active tasks: []
    suspended standby tasks: []
04:02:13.682 6985 [main] INFO  com..... - Started Application in 6.647 seconds (JVM running for 7.484)
04:02:23.300 16603 [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-1] INFO  o.a.k.s.p.internals.StreamThread - stream-thread [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-1] State transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED
04:02:23.300 16603 [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-2] INFO  o.a.k.s.p.internals.StreamThread - stream-thread [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-2] State transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED
04:02:23.328 16631 [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-1] INFO  o.a.k.s.p.internals.StreamThread - stream-thread [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-1] partition assignment took 28 ms.
    current active tasks: [0_0, 1_0, 2_0, 3_0, 4_0, 5_0, 6_0, 7_5, 8_5, 9_5, 10_5, 12_4, 13_4, 14_4, 15_4, 16_4, 17_4, 19_3, 20_3, 21_3, 22_3, 23_3, 24_3, 25_3, 29_0]
    current standby tasks: [0_2]
    previous active tasks: []

04:02:23.328 16631 [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-2] INFO  o.a.k.s.p.internals.StreamThread - stream-thread [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-2] partition assignment took 28 ms.
    current active tasks: [0_3, 1_3, 2_3, 3_3, 4_3, 5_3, 7_2, 8_2, 9_2, 10_2, 12_1, 13_1, 14_1, 15_1, 16_1, 17_1, 19_0, 20_0, 21_0, 22_0, 23_0, 24_0, 25_0, 26_0]
    current standby tasks: [0_5]
    previous active tasks: []
04:03:47.602 100905 [http-nio-8080-exec-10] INFO  c.j.d.r.b.p.base.StreamsRestService - State of Kafka Streams Application: REBALANCING
04:03:49.356 102659 [http-nio-8080-exec-2] INFO  c.j.d.r.b.p.base.StreamsRestService - State of Kafka Streams Application: REBALANCING
04:03:51.600 104903 [http-nio-8080-exec-3] INFO  c.j.d.r.b.p.base.StreamsRestService - State of Kafka Streams Application: REBALANCING
04:03:53.356 106659 [http-nio-8080-exec-4] INFO  c.j.d.r.b.p.base.StreamsRestService - State of Kafka Streams Application: REBALANCING

Number of topics - 100
Partitions per topic - 6.  (7 topics with 1 partition only)
kubernetes env - 3 pods ( 2 stream threads )

当我们尝试使用以下命令列出消费者组

root@bastion-0:/app/confluent-5.2.2/bin# ./kafka-consumer-groups --describe --group app  --bootstrap-server kafka-0..local:9094 --command-config /app/client-sasl-ssl.properties --members

CONSUMER-ID                                                                                               HOST                    CLIENT-ID                                                            #PARTITIONS     
app-b8c729c9-dc1c-457b-8120-457035e84e58-StreamThread-1-consumer-3b370697-e737-411c-af28-fb04cfbae1dd 1.1.1.1/1.1.1.1 app-b8c729c9-dc1c-457b-8120-457035e84e58-StreamThread-1-consumer 45              
app-aaef2f83-d51c-4b6f-bbd8-616db988bd33-StreamThread-2-consumer-3edb3e5f-9f1a-499f-8732-6cd2c8b96c96 2.2.2.2/2.2.2.2 app-aaef2f83-d51c-4b6f-bbd8-616db988bd33-StreamThread-2-consumer 45              
app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-1-consumer-00e24df4-5669-4e2c-a775-8f6c4f689714 3.3.3.3/3.3.3.3 app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-1-consumer 46              
app-b8c729c9-dc1c-457b-8120-457035e84e58-StreamThread-2-consumer-1b6b2955-5dfd-4be7-8ad9-9f1b54fe6310 1.1.1.1/1.1.1.1 app-b8c729c9-dc1c-457b-8120-457035e84e58-StreamThread-2-consumer 45              
app-aaef2f83-d51c-4b6f-bbd8-616db988bd33-StreamThread-1-consumer-72cd0319-8ca7-493c-891d-3022b235ea01 2.2.2.2/2.2.2.2 app-aaef2f83-d51c-4b6f-bbd8-616db988bd33-StreamThread-1-consumer 45              
app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-2-consumer-c1a16d64-8d49-4758-ab64-2af3cd9aef0f 3.3.3.3/3.3.3.3 app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-2-consumer 45   

以上命令的输出一直在变化-从0到某个可变数字。理想情况下,它应该在一段时间后变得稳定。

kafka流平衡(重新平衡)是否有任何可调参数/配置

问题:

  1. 是什么导致应用程序在启动时无休止地重新平衡(即使没有错误/异常等)。

  2. 是否有任何可调参数可以帮助我们避免重新平衡?

1 个答案:

答案 0 :(得分:3)

看一下您添加的日志,消费者吊舱正在启动,因此我想也许其他2个吊舱正在滚动重新启动,因此每次停止一次启动就重新平衡。

尽管Kafka运行很快,但重新平衡不是很快,因为在此过程中整个组之间都有聊天-尽管可以将分区分配给一个使用者,但只有在所有使用者都分配了该组并且发现分配仅在poll方法内发生(请参见https://chrisg23.blogspot.com/2020/02/why-is-pausing-kafka-consumer-so.html)。

因此,加快处理速度的方法是更频繁地轮询,以便您可以更快地了解更改,但是这是一个折衷方案-如果在正常运行中主题不忙,那么将会有很多旋转什么也不做。

但是,您不确定自己的意思是什么。如果您的意思是该应用程序实际上只是在重新平衡,那么请参阅上面的评论。可能是Pod连续不断地上下波动(心跳减弱),或者轮询花费了很长时间-您是否为每条记录执行大量I / O?从日志和容器名称中可以很明显地看到重新启动。过多的轮询还会导致警告消息,提示您增加max.poll.interval.ms或减少max.poll.records

相关问题