Question

我有一个基本的Kafka Streams应用程序，该应用程序从in_topic读取，执行滚动聚合，并执行联接以发布到out_topic。这已经运行了好几个星期，但今天早晨崩溃了，将不再开始。我认为这与代码无关。错误之前的日志是：

2019-01-21 17:46:32,803 localhost org.apache.kafka.clients.producer.KafkaProducer: [Producer clientId=rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1-0_0-producer, transactionalId=rtt-healthscore-stream-0_0] Instantiated a transactional producer.
2019-01-21 17:46:32,803 localhost org.apache.kafka.clients.producer.KafkaProducer: [Producer clientId=rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1-0_0-producer, transactionalId=rtt-healthscore-stream-0_0] Overriding the default acks to all since idempotence is enabled.
2019-01-21 17:46:32,818 localhost org.apache.kafka.common.utils.AppInfoParser: Kafka version : 2.0.0
2019-01-21 17:46:32,818 localhost org.apache.kafka.common.utils.AppInfoParser: Kafka commitId : 3402a8361b734732
2019-01-21 17:46:32,832 localhost org.apache.kafka.clients.producer.internals.TransactionManager: [Producer clientId=rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1-0_0-producer, transactionalId=rtt-healthscore-stream-0_0] ProducerId set to -1 with epoch -1
2019-01-21 17:47:32,833 localhost org.apache.kafka.streams.processor.internals.StreamThread: stream-thread [rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1] Error caught during partition assignment, will abort the current process and re-throw at the end of rebalance: {}
org.apache.kafka.common.errors.TimeoutException: Timeout expired while initializing transactional state in 60000ms.
2019-01-21 17:47:32,843 localhost org.apache.kafka.streams.processor.internals.StreamThread: stream-thread [rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1] partition assignment took 60062 ms.
    current active tasks: []
    current standby tasks: []
    previous active tasks: []

2019-01-21 17:47:32,845 localhost org.apache.kafka.streams.processor.internals.StreamThread: stream-thread [rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1] State transition from PARTITIONS_ASSIGNED to PENDING_SHUTDOWN
2019-01-21 17:47:32,845 localhost org.apache.kafka.streams.processor.internals.StreamThread: stream-thread [rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1] Shutting down
2019-01-21 17:47:32,860 localhost org.apache.kafka.streams.processor.internals.StreamThread: stream-thread [rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD
2019-01-21 17:47:32,860 localhost org.apache.kafka.streams.KafkaStreams: stream-client [rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804] State transition from REBALANCING to ERROR
2019-01-21 17:47:32,860 localhost org.apache.kafka.streams.KafkaStreams: stream-client [rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804] All stream threads have died. The instance will be in error state and should be closed.
2019-01-21 17:47:32,860 localhost org.apache.kafka.streams.processor.internals.StreamThread: stream-thread [rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1] Shutdown complete
Exception in thread "rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: stream-thread [rtt-healthscore-stream-7d679951-913b-4976-a43e-0b437c22c804-StreamThread-1] Failed to rebalance.
    at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:870)
    at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:810)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired while initializing transactional state in 60000ms.

所有kafka设置/配置均未更改，并且所有代理均可用。我的Kafka版本是2.0。我可以从控制台使用者的in_topic中读取内容，因此此应用程序之前的所有内容都很好。感谢所有帮助。

Answer 1

升级到Kafka 2.1之后，我们的项目也发生了相同的超时失败，我们还不知道原因。

我们的临时解决方法是禁用exactly_once配置，该配置会跳过初始化事务状态。

Answer 2

在升级到2.1之后，我们也遇到了这些错误（而且我认为在以前升级到早期版本时也是如此）。

我们在kubernetes环境中运行，在该环境中，滚动升级后，代理可以更改IP地址。从代理日志中：

[2019-02-20 02:20:20,085] WARN [TransactionCoordinator id=1001] Connection 
to node 0 (khaki-joey-kafka-0.khaki-joey-kafka-headless.hyperspace-dev/10.233.124.181:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2019-02-20 02:20:57,205] WARN [TransactionCoordinator id=1001] Connection to node 1 (khaki-joey-kafka-1.khaki-joey-kafka-headless.hyperspace-dev/10.233.122.67:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

我可以看到事务协调器仍在使用两个旧的代理（在升级后一天）重新启动时使用过时的IP地址。

可能的选项：

如this answer所述，为流光关闭“精确一次”。然后，它不使用事务，并且一切正常。如果您需要EOS或某些其他客户端代码需要事务，则无济于事。
重新启动所有正在报告警告的代理，以强制其重新解析IP地址。需要以不自行更改IP地址的方式重新启动它们。在Kubernetes中通常是不可能的。

缺陷引发Issue KAFKA-7958 - Transactions are broken with kubernetes hosted brokers

更新2017年2月20日，此问题可能已在今天发布的Kafka 2.1.1（融合 5.1.2 ）中得到解决。请参阅链接的问题。

Answer 3

It's resolved after upgrade
https://kafka.apache.org/25/documentation/streams/developer-guide/write-streams.html

<dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-streams</artifactId>
        <version>2.5.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>2.5.0</version>
    </dependency>
    <!-- Optionally include Kafka Streams DSL for Scala for Scala 2.12 -->
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-streams-scala_2.12</artifactId>
        <version>2.5.0</version>
    </dependency>

Kafka Streams-无法重新平衡错误

3 个答案: