KafkaStreams有状态应用程序随机失败

时间:2018-11-22 15:41:18

标签: scala apache-kafka-streams stateful kafka-streams-scala

嗨,这是我几天以来偶然发现的一个问题,我自己找不到答案。

我正在使用scala流API v2.0.0。

我有两个传入流,它们分流到两个用于隔离的处理程序中,并且都使用一个公共StateStore声明了一个Transformer。

要快速浏览一下,它看起来像

def buildStream(builder: StreamsBuilder, config: Config) = {
    val store = Stores.keyValueStoreBuilder[String, AggregatedState](Stores.persistentKeyValueStore(config.storeName), ...)
    builder.addStateStore(store)

    val handlers = List(handler1, handler2)

    builder
      .stream(config.topic)
      .branch(handlers.map(_.accepts).toList: _*) // Dispatch events to the first handler accepting it
      .zip(handlers.toList)                       // (KStream[K, V], Handler)
      .map((h, stream) => h.handle(stream))       // process the event on the correct handler
      .reduce((s1, s2) => s1.merge(s2))           // merge them back as they return the same object
      .to(config.output)

    builder
}

我的每个处理程序看起来都一样:进行一个事件,执行一些操作,通过transform()方法来派生一个状态并发出一个聚合:

class Handler1(config: Config) {
    def accepts(key: String, value: Event): Boolean = ???  // Implementation not needed

    def handle(stream: KStream[String, Event]) = {
        stream
          .(join/map/filter)
          .transform(new Transformer1(config.storeName))
    }
}


class Handler2(config: Config) {
    def accepts(key: String, value: Event): Boolean = ???  // Implementation not needed

    def handle(stream: KStream[String, Event]) = {
        stream
          .(join/map/filter)
          .transform(new Transformer2(config.storeName))
    }
}

转换器使用具有以下逻辑的同一StateStore:对于新事件,检查其聚合是否存在,如果是,则对其进行更新+存储它+发出新的聚合,否则构建该聚合+存储它+发出。< / p>

class Transformer1(storeName: String) {
    private var store: KeyValueStore[String, AggregatedState] = _

    override def init(context: ProcessorContext): Unit = {
        store = context.getStateStore(storeName).asInstanceOf[KeyValueStore[K, AggregatedState]]
    }

    override def transform(key: String, value: Event): (String, AggregatedState) = {
        val existing: Option[AggregatedState] = Option(store.get(key))
        val agg = existing.map(_.updateWith(event)).getOrElse(new AggregatedState(event))

        store.put(key, agg)
        if(agg.isTerminal){
          store.delete(key)
        }
        if(isDuplicate(existing, agg)){
            null                              // Tombstone, we have a duplicate
        } else{
            (key, agg)                        // Emit the new aggregate
        }
    }

    override def close() = Unit
}


class Transformer2(storeName: String) {
    private var store: KeyValueStore[String, AggregatedState] = _

    override def init(context: ProcessorContext): Unit = {
        store = context.getStateStore(storeName).asInstanceOf[KeyValueStore[K, AggregatedState]]
    }

    override def transform(key: String, value: Event): (String, AggregatedState) = {
        val existing: Option[AggregatedState] = Option(store.get(key))
        val agg = existing.map(_.updateWith(event)).getOrElse(new AggregatedState(event))

        store.put(key, agg)
        if(agg.isTerminal){
          store.delete(key)
        }
        if(isDuplicate(existing, agg)){
            null                              // Tombstone, we have a duplicate
        } else{
            (key, agg)                        // Emit the new aggregate
        }
    }

    override def close() = Unit
}

Transformer2相同,只是业务逻辑发生了变化(如何将新事件与聚合状态合并)

我的问题是在流启动时,我可以正常启动也可以启动异常:

15:07:23,420 ERROR org.apache.kafka.streams.processor.internals.AssignedStreamsTasks  - stream-thread [job-tracker-prod-5ba8c2f7-d7fd-48b5-af4a-ac78feef71d3-StreamThread-1] Failed to commit stream task 1_0 due to the following error:
org.apache.kafka.streams.errors.ProcessorStateException: task [1_0] Failed to flush state store KSTREAM-AGGREGATE-STATE-STORE-0000000003
    at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:242)
    at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:198)
    at org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:406)
    at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:380)
    at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:368)
    at org.apache.kafka.streams.processor.internals.AssignedTasks$1.apply(AssignedTasks.java:67)
    at org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:362)
    at org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:352)
    at org.apache.kafka.streams.processor.internals.TaskManager.commitAll(TaskManager.java:401)
    at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:1035)
    at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:845)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Caused by: java.lang.IllegalStateException: This should not happen as timestamp() should only be called while a record is processed
    at org.apache.kafka.streams.processor.internals.AbstractProcessorContext.timestamp(AbstractProcessorContext.java:161)
    at org.apache.kafka.streams.state.internals.StoreChangeLogger.logChange(StoreChangeLogger.java:59)
    at org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.put(ChangeLoggingKeyValueBytesStore.java:66)
    at org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.put(ChangeLoggingKeyValueBytesStore.java:31)
    at org.apache.kafka.streams.state.internals.InnerMeteredKeyValueStore.put(InnerMeteredKeyValueStore.java:206)
    at org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore.put(MeteredKeyValueBytesStore.java:117)
    at com.mycompany.streamprocess.Transformer1.transform(Transformer1.scala:49) // Line with store.put(key, agg)

我已经使用“变压器使用工厂模式”进行搜索并得到了结果,这就是这里使用的(因为.transform会使用变压器并在引擎盖下创建一个TransformerSupplier)。 由于错误是伪随机的(我可能会重新创建它),所以我猜这可能是启动时的竞争条件,但我没有发现任何结论。 是因为我在不同的转换器上使用了相同的状态存储吗?

1 个答案:

答案 0 :(得分:1)

我认为您正在击中https://issues.apache.org/jira/browse/KAFKA-7250

在2.0.1和2.1.0版本中已修复。

如果无法升级,则需要显式传递TransformerSupplier,因为Scale API在2.0.0中错误地构造了供应商。

.transform(() => new Transformer1(config.storeName))