没有状态的任务会导致DAG失败

时间:2017-08-24 19:18:48

标签: airflow apache-airflow

我有一个DAG从Elasticsearch获取数据并摄取到数据湖中。第一个任务 BeginIngestion 在几个任务(每个资源一个)中打开,这些任务在更多任务中打开(每个分片一个)。提取分片后,数据将上传到S3,然后关闭到任务 EndIngestion ,然后执行任务 AuditIngestion

它正在正确执行,但现在所有任务都已成功执行,但“结束任务” EndIngestion 仍然没有状态。当我刷新网络服务器的页面时,DAG标记为失败

enter image description here 此图像显示成功的上游任务,任务end_ingestion没有状态,DAG标记为失败

我还挖掘了任务实例细节并找到了

  • Dagrun Running:任务实例的dagrun未处于“运行”状态,但状态为“失败”。
  • 触发规则:任务的触发规则'all_success'要求所有上游任务都成功,但发现1个不成功。 upstream_tasks_state = {'failed':0,'upstream_failed':0,'跳过':0,'完成':49,'成功':49},upstream_task_ids = ['s3_finish_upload_ingestion_raichucrud_complain','s3_finish_upload_ingestion_raichucrud_interaction','s3_finish_upload_ingestion_raichucrud_company',' s3_finish_upload_ingestion_raichucrud_user”, 's3_finish_upload_ingestion_raichucrud_privatecontactinteraction', 's3_finish_upload_ingestion_raichucrud_location', 's3_finish_upload_ingestion_raichucrud_companytoken', 's3_finish_upload_ingestion_raichucrud_indexevolution', 's3_finish_upload_ingestion_raichucrud_companyindex', 's3_finish_upload_ingestion_raichucrud_producttype', 's3_finish_upload_ingestion_raichucrud_categorycomplainsto', 's3_finish_upload_ingestion_raichucrud_companyresponsible', 's3_finish_upload_ingestion_raichucrud_category', 's3_finish_upload_ingestion_raichucrud_additionalfieldoption', 's3_finish_upload_ingestion_raichucrud_privatecontactconfiguration', 's3_finish_upload_ingestion_raichucrud_phone' , 's3_finish_upload_ingestion_raichucrud_presence', 's3_finish_upload_ingestion_raichucrud_responsible', 's3_finish_upload_ingestion_raichucrud_store', 's3_finish_upload_ingestion_raichucrud_socialprofile', 's3_finish_upload_ingestion_raichucrud_product', 's3_finish_upload_ingestion_raichucrud_macrorankingpresenceto', 's3_finish_upload_ingestion_raichucrud_macroinfoto', 's3_finish_upload_ingestion_raichucrud_raphoneproblem', 's3_finish_upload_ingestion_raichucrud_macrocomplainsto', 's3_finish_upload_ingestion_raichucrud_testimony', 's3_finish_upload_ingestion_raichucrud_additionalfield', 's3_finish_upload_ingestion_raichucrud_companypageblockitem',' s3_finish_upload_ingestion_raichucrud_rachatconfiguration ' 's3_finish_upload_ingestion_raichucrud_macrorankingitemto', 's3_finish_upload_ingestion_raichucrud_purchaseproduct', 's3_finish_upload_ingestion_raichucrud_rachatproblem', 's3_finish_upload_ingestion_raichucrud_role', 's3_finish_upload_ingestion_raichucrud_requestmoderation',' s3_f inish_upload_ingestion_raichucrud_categoryproblemto”, 's3_finish_upload_ingestion_raichucrud_companypageblock', 's3_finish_upload_ingestion_raichucrud_problemtype', 's3_finish_upload_ingestion_raichucrud_key', 's3_finish_upload_ingestion_raichucrud_macro', 's3_finish_upload_ingestion_raichucrud_url', 's3_finish_upload_ingestion_raichucrud_document', 's3_finish_upload_ingestion_raichucrud_transactionkey', 's3_finish_upload_ingestion_raichucrud_catprobitemcompany', 's3_finish_upload_ingestion_raichucrud_privatecontactinteraction', 's3_finish_upload_ingestion_raichucrud_categoryinfoto', 's3_finish_upload_ingestion_raichucrud_marketplace', 's3_finish_upload_ingestion_raichucrud_macroproblemto' ,'s3_finish_upload_ingestion_raichucrud_categoryrankingto','s3_finish_upload_ingestion_raichucrud_macrorankingto','s3_finish_upload_ingestion_raichucrud_categorypageto']

如您所见,“触发规则”字段表示其中一个任务处于“非成功状态”,但同时统计数据显示所有上游都标记为成功。

如果我重置数据库,它不会发生,但我不能为每次执行(每小时)重置它。我也不想重置它。

有人有光吗?

PS:我使用 LocalExecutor 在EC2实例(c4.xlarge)中运行。

[编辑] 我在调度程序日志中发现DAG处于死锁状态:

[2017-08-25 19:25:25,821] {models.py:4076} DagFileProcessor157 INFO - 死锁;标记运行失败

我想这可能是由于一些异常处理。

1 个答案:

答案 0 :(得分:2)

之前我遇到过这个问题,对我来说,我的代码生成了重复的任务ID。看起来在你的情况下也有一个重复的id: s3_finish_upload_ingestion_raichucrud_privatecontactinteraction

这可能是你迟到了一年,但希望这会节省其他人,大量的调试时间:)