使用PrestoDB查询无法按计划运行气流任务

时间:2019-03-05 06:53:13

标签: pyspark airflow presto airflow-scheduler

我定义了一个气流示例任务,我想在其中运行PrestoDB查询,然后执行Spark作业来执行一个简单的字数示例。这是我定义的DAG:

from pandas import DataFrame
import logging
from datetime import timedelta

from operator import add

import airflow
from airflow import DAG
from airflow.operators.python_operator import PythonOperator

from airflow.hooks.presto_hook import PrestoHook

default_args = {
    'owner': 'airflow',
    'start_date': airflow.utils.dates.days_ago(1),
    'depends_on_past': False,
    'email': ['airflow@example.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
    }

dag = DAG(
    'presto_dag',
    default_args=default_args,
    description='A simple tutorial DAG with PrestoDB and Spark',
    # Continue to run DAG once per hour
    schedule_interval='@daily',
)

def talk_to_presto():
    ph = PrestoHook(host='presto.myhost.com', port=9988)

    # Query PrestoDB
    query = "show catalogs"

    # Fetch Data
    data = ph.get_records(query)
    logging.info(data)
    return data

def submit_to_spark():
    # conf = SparkConf().setAppName("PySpark App").setMaster("http://sparkhost.com:18080/")
    # sc = SparkContext(conf)
    # data = sc.parallelize(list("Hello World"))
    # counts = data.map(lambda x: (x, 1)).reduceByKey(add).sortBy(lambda x: x[1], ascending=False).collect()
    # for (word, count) in counts:
    #     print("{}: {}".format(word, count))
    # sc.stop()
    return "Hello"

presto_task = PythonOperator(
    task_id='talk_to_presto',
    provide_context=True,
    python_callable=talk_to_presto,
    dag=dag,
)

spark_task = PythonOperator(
    task_id='submit_to_spark',
    provide_context=True,
    python_callable=submit_to_spark,
    dag=dag,
)

presto_task >> spark_task

提交任务时,大约有20个DAG实例处于运行状态: Airflow DAG

但是它永远不会完成,并且至少对于PrestoDB查询不会生成任何日志。我能够从Airflow的Data Profiling > Ad-Hoc Query部分正确运行相同的PrestoDB查询。

我故意注释掉了PySpark代码,因为它没有运行并且不在问题中。

我有两个问题:

  1. 为什么任务没有完成并保持运行状态?
  2. 由于查询未运行,PrestoHook在做什么?

0 个答案:

没有答案