为每个任务指定并行度?

时间:2017-04-07 11:09:31

标签: airflow

我知道在cfg中我可以设置并行性,但有没有办法按每个任务执行,或者至少每个dag?

DAG1 =

task_id: 'download_sftp'
parallelism: 4 #I am fine with downloading multiple files at once


task_id: 'process_dimensions'
parallelism: 1 #I want to make sure the dimensions are processed one at a time to prevent conflicts with my 'serial' keys

task_id: 'process_facts'
parallelism: 4 #It is fine to have multiple tables processed at once since there will be no conflicts

dag2(单独文件)=

task_id: 'bcp_query'
parallelism: 6 #I can query separate BCP commands to download data quickly since it is very small amounts of data

2 个答案:

答案 0 :(得分:0)

您可以通过web gui创建任务池,并通过指定要使用该池的特定任务来限制执行并行性。

请参阅:https://airflow.apache.org/concepts.html#pools

答案 1 :(得分:0)

可用以下参数(在airflow.cfg配置文件中显示)控制活动DAG的运行次数,该参数适用于全局。 默认情况下,将其设置为16,将其更改为1,可以确保一次仅一次dag实例,其余队列被排队。

#每个DAG的最大活动DAG运行次数

max_active_runs_per_dag = 16

How to limit Airflow to run only 1 DAG run at a time?->建议如何控制每dag的并发性

相关问题