运行5天后引发AWS Datapipeline已取消任务异常

时间:2018-12-04 06:18:46

标签: amazon-web-services pipeline cancellation days

我一直在尝试运行一个调用bash进程的AWS数据管道,该bash进程从shell命令活动中调用了多个长时间运行的python和java进程。每次运行shell命令活动时,恰好5天后,任务运行器日志中都会引发reportProgress错误,并且该任务被取消。即使将tryTimeTimeout和LateAfterTimeout字段设置为超过5天,此问题仍然存在。 Task Runner日志消息和datapipeline json定义如下所示:

Screenshot of pipeline execution error

任务运行者日志消息:

01 Dec 2018 18:55:05,693 https://forums.aws.amazon.com/ (HeartBeatService-df-01341812NWJEQ1FAYI1K-@ShellCommandActivityId_UdTMC_2018-11-26T18:54:03_Attempt=1) amazonaws.datapipeline.taskrunner.HeartBeatService: HeartBeatService DataPipeline reportProgress error thrown and workCancelleddf-01341812NWJEQ1FAYI1K-@ShellCommandActivityId_UdTMC_2018-11-26T18:54:03_Attempt=1 
amazonaws.datapipeline.taskrunner.CanceledTaskException: DataPipeline service requested this work be canceled.
at amazonaws.datapipeline.taskrunner.DataPipelineProgressReporter.reportProgress*(DataPipelineProgressReporter.java:31) 

...

01 Dec 2018 18:55:06,726 https://forums.aws.amazon.com/ (TaskRunnerService-wg-10000-2) amazonaws.datapipeline.taskrunner.TaskPoller: Work ShellCommandActivity took 7201:0 to complete

PIPELINE JSON定义

{
"objects": [
{
"failureAndRerunMode": "CASCADE",
"resourceRole": "DataPipelineDefaultResourceRole",
"role": "DataPipelineDefaultRole",
"pipelineLogUri": "s3://oobhuntoo1/",
"scheduleType": "ONDEMAND",
"name": "Default",
"id": "Default"
},
{
"onLateAction": {
"ref": "ActionId_V6bq0"
},
"lateAfterTimeout": "7 Days",
"name": "DefaultShellCommandActivity1",
"id": "ShellCommandActivityId_UdTMC",
"workerGroup": "wg-10000",
"type": "ShellCommandActivity",
"command": "python ~/AWS_5day_Test/Python/Layer1.py"
},
{
"name": "DefaultAction1",
"id": "ActionId_V6bq0",
"type": "Terminate"
}
],
"parameters": []
} 

0 个答案:

没有答案