超时异常Flink

时间:2019-06-08 13:37:29

标签: docker apache-flink flink-streaming

我对Flink有疑问。我正在具有1个TaskManager和4个Taskslot的本地群集中运行应用程序。

运行应用程序一段时间后,出现超时错误:

java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id feea6a6702a0cf960ae2847b5bd25665 timed out.

我看过一些有关该主题的帖子,但有任何答案。您能否帮助我查看根本原因或可能的故障排除方法?

我正在使用flink 1.5.3版

发生这种情况时,似乎已停止了taskmanagers和JobManager的docker容器。

让我从JobManager容器日志中添加错误跟踪:

2019-06-09 13:31:06,300 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Socket Window NgsiEvent (ef3a860de48d54544d973754c6170d8b) switched from state FAILING to FAILED.
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id 63dbab620797b84da023b33578478238 timed out.
    at org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1609)
    at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:339)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:154)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2019-06-09 13:31:06,308 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Could not restart the job Socket Window NgsiEvent (ef3a860de48d54544d973754c6170d8b) because the restart strategy prevented it.
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id 63dbab620797b84da023b33578478238 timed out.
    at org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1609)
    at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:339)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:154)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2019-06-09 13:31:06,317 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Stopping checkpoint coordinator for job ef3a860de48d54544d973754c6170d8b.
2019-06-09 13:31:06,322 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore  - Shutting down
2019-06-09 13:31:06,331 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@16363182f31f:36715]] Caused by: [16363182f31f]
2019-06-09 13:31:06,351 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Job ef3a860de48d54544d973754c6170d8b reached globally terminal state FAILED.
2019-06-09 13:31:06,434 INFO  org.apache.flink.runtime.jobmaster.JobMaster                  - Stopping the JobMaster for job Socket Window NgsiEvent(ef3a860de48d54544d973754c6170d8b).
2019-06-09 13:31:06,447 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPool          - Suspending SlotPool.
2019-06-09 13:31:06,448 INFO  org.apache.flink.runtime.jobmaster.JobMaster                  - Close ResourceManager connection 883e842633b0fd9a2e53ab45778581fe: JobManager is shutting down..
2019-06-09 13:31:06,449 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcActor                - The rpc endpoint org.apache.flink.runtime.jobmaster.slotpool.SlotPool has not been started yet. Discarding message org.apache.flink.runtime.rpc.messages.LocalRpcInvocation until processing is started.
2019-06-09 13:31:06,457 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Disconnect job manager 00000000000000000000000000000000@akka.tcp://flink@jobmanager:6123/user/jobmanager_2 for job ef3a860de48d54544d973754c6170d8b from the resource manager.
2019-06-09 13:31:06,459 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPool          - Stopping SlotPool.
2019-06-09 13:31:06,460 INFO  org.apache.flink.runtime.jobmaster.JobManagerRunner           - JobManagerRunner already shutdown.
2019-06-09 13:31:16,304 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@16363182f31f:36715]] Caused by: [16363182f31f: Name or service not known]
2019-06-09 13:31:26,320 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@16363182f31f:36715]] Caused by: [16363182f31f: Name or service not known]
2019-06-09 13:31:36,286 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@16363182f31f:36715]] Caused by: [16363182f31f]

谢谢!

0 个答案:

没有答案