如何将BigQueryOperator与execute_date一起使用?

时间:2018-12-18 07:25:03

标签: airflow

这是我的代码:

EXEC_TIMESTAMP  = "{{  execution_date.strftime('%Y-%m-%d %H:%M')  }}"
query = """
        select ... where date_purchased between TIMESTAMP_TRUNC(cast ( {{ params.run_timestamp }} as TIMESTAMP), HOUR, 'UTC') ...
        """
generate_op = BigQueryOperator(
                    bql=query,
                    destination_dataset_table=table_name,
                    task_id='generate',
                    bigquery_conn_id=CONNECTION_ID,
                    use_legacy_sql=False,
                    write_disposition='WRITE_TRUNCATE',
                    create_disposition='CREATE_IF_NEEDED',
                    query_params={'run_timestamp': EXEC_TIMESTAMP},
                    dag=dag)

这应该可以,但是不能。 渲染标签向我显示:

between TIMESTAMP_TRUNC(cast (  as TIMESTAMP), HOUR, 'UTC')

日期丢失。它被渲染成空。

我该如何解决?该操作符没有provide_context=True。我不知道该怎么办。

2 个答案:

答案 0 :(得分:2)

路易斯,query_params不是您可以在模板上下文中引用的params。他们没有被添加到它。并且由于params为空,因此您的{{ params.run_timestamp }}""None。如果将其更改为params={'run_timestamp':…},则由于params的值未模板化,仍然会出现问题。因此,当您使用模板化字段bql来包含{{ params.run_timestamp }}时,您将准确地填充params: {'run_timestamp': …str… }中的内容,而无需对该值进行任何递归扩展。您应该得到{{ execution_date.strftime('%Y-%m-%d %H:%M') }}

让我尝试为您重写此代码(但不确定我是否正确地绕过演员表):

generate_op = BigQueryOperator(
                    sql="""
select ...
where date_purchased between
  TIMESTAMP_TRUNC(cast('{{execution_date.strftime('%Y-%m-%d %H:%M')}}') as TIMESTAMP), HOUR, 'UTC')
...
                    """,
                    destination_dataset_table=table_name,
                    task_id='generate',
                    bigquery_conn_id=CONNECTION_ID,
                    use_legacy_sql=False,
                    write_disposition='WRITE_TRUNCATE',
                    create_disposition='CREATE_IF_NEEDED',
                    dag=dag,
)

您可以see the bql and sql fields are templated。但是后面的代码中的bql field is deprecated and removed

答案 1 :(得分:2)

问题是您正在使用query_params,它不是@dlamblin提到的模板字段。

使用以下代码直接使用execution_date中的bql日期:

import airflow
from airflow.models import DAG, Variable
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import PythonOperator
from datetime import datetime,timedelta
from airflow.contrib.operators.bigquery_operator import BigQueryOperator
import os


CONNECTION_ID = Variable.get("Your_Connection")

args = {
    'owner': 'airflow',
    'start_date': datetime(2018, 12, 27, 11, 15),
    'retries': 4,
    'retry_delay': timedelta(minutes=10)
}


dag = DAG(
    dag_id='My_Test_DAG',
    default_args=args,
    schedule_interval='15 * * * *',
    max_active_runs=1,
    catchup=False,
)

query = """select customers_email_address as email, 
   from mytable
   where 
    and date_purchased = TIMESTAMP_SUB(TIMESTAMP_TRUNC(cast ({{  execution_date.strftime('%Y-%m-%d %H:%M')  }} as TIMESTAMP), HOUR, 'UTC'), INTERVAL 1 HOUR) """

create_orders_temp_table_op = BigQueryOperator(
                    bql = query,
                    destination_dataset_table='some table',
                    task_id='create_orders_temp_table',
                    bigquery_conn_id=CONNECTION_ID,
                    use_legacy_sql=False,
                    write_disposition='WRITE_TRUNCATE',
                    create_disposition='CREATE_IF_NEEDED',
                    dag=dag)

start_task_op = DummyOperator(task_id='start_task', dag=dag)


start_task_op  >> create_orders_temp_table_op
相关问题