Airflow's Gunicorn is spamming error logs

时间:2018-05-14 17:37:32

标签: gunicorn airflow

I'm using Apache Airflow and recognized that the size of the gunicorn-error.log grown over 50 GB within 5 months. Most of the log messages are INFO level logs like:

[2018-05-14 17:31:39 +0000] [29595] [INFO] Handling signal: ttou
[2018-05-14 17:32:37 +0000] [2359] [INFO] Worker exiting (pid: 2359)
[2018-05-14 17:33:07 +0000] [29595] [INFO] Handling signal: ttin
[2018-05-14 17:33:07 +0000] [5758] [INFO] Booting worker with pid:
5758 [2018-05-14 17:33:10 +0000] [29595] [INFO] Handling signal: ttou [2018-05-14 17:33:41 +0000] [2994] [INFO] Worker exiting (pid: 2994)
[2018-05-14 17:34:11 +0000] [29595] [INFO] Handling signal: ttin
[2018-05-14 17:34:11 +0000] [6400] [INFO] Booting worker with pid: 6400 [2018-05-14 17:34:13 +0000] [29595] [INFO] Handling signal: ttou
[2018-05-14 17:34:36 +0000] [3611] [INFO] Worker exiting (pid: 3611)

Within the Airflow config file I'm only able to set the log file path. Does anyone know how to change the gunicorn logging to another level within Airflow? I do not need this fine grained logging level because it's overfills my hard drive.

2 个答案:

答案 0 :(得分:0)

在Airflow中,记录日志似乎有点棘手。 原因之一是日志记录分为几个部分。 例如,Airflow的日志记录配置与gunicorn Web服务器的日志记录配置完全不同(您在邮件中提到的“垃圾邮件”日志来自gunicorn)。

为解决此垃圾邮件问题,我通过在webserver()函数中添加几行来对Airflow的bin / cli.py进行了一些修改:

   if args.log_config:
        run_args += ['--log-config', str(args.log_config)]

(为简洁起见,我没有粘贴代码来处理参数)

然后,关于日志配置文件,我有类似以下内容:

[loggers]
keys=root, gunicorn.error, gunicorn.access

[handlers]
keys=console, error_file, access_file

[formatters]
keys=generic, access

[logger_root]
level=INFO
handlers=console

[logger_gunicorn.error]
level=INFO
handlers=error_file
propagate=0
qualname=gunicorn.error

[logger_gunicorn.access]
level=INFO
handlers=access_file
propagate=1
qualname=gunicorn.access

[handler_console]
class=StreamHandler
formatter=generic
args=(sys.stdout, )

[handler_error_file]
class=logging.handlers.TimedRotatingFileHandler
formatter=generic
args=('/home/airflow/airflow/logs/webserver/gunicorn.error.log',)

[handler_access_file]
class=logging.handlers.TimedRotatingFileHandler
formatter=access
args=('/home/airflow/airflow/logs/webserver/gunicorn.access.log',)

[formatter_generic]
format=[%(name)s] [%(module)s] [%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s
#format=[%(levelname)s] %(asctime)s [%(process)d] [%(levelname)s] %(message)s
datefmt=%Y-%m-%d %H:%M:%S
class=logging.Formatter

[formatter_access]
format=%(message)s
class=logging.Formatter

请注意gunicorn.error中的“ propagate = 0”,这可以避免标准输出中的垃圾邮件。您仍然可以使用它们,但至少它位于/home/airflow/airflow/airflow/logs/webserver/gunicorn.error.log中,应该将其旋转(说实话,我还没有完全测试旋转部分)。

如果有时间,我将把此更改作为Jira的Airflow机票提交。

答案 1 :(得分:0)

我设法通过设置环境变量来解决问题

GUNICORN_CMD_ARGS "--log-level WARNING"