Remove Airflow Scheduler logs

15,304

Solution 1

Inspired by this reply, I have added the airflow-log-cleanup.py DAG (with some changes to its parameters) from here to remove all old airflow logs, including scheduler logs.

My changes are minor except that given my EC2's disk size (7.7G for /dev/xvda1), 30 days default value for DEFAULT_MAX_LOG_AGE_IN_DAYS seemed too large so (I had 4 DAGs) I changed it to 14 days, but feel free to adjust it according to your environment:

DEFAULT_MAX_LOG_AGE_IN_DAYS = Variable.get("max_log_age_in_days", 30) changed to DEFAULT_MAX_LOG_AGE_IN_DAYS = Variable.get("max_log_age_in_days", 14)

Solution 2

Following could be one option to resolve this issue.

Login to the docker container using following mechanism

#>docker exec -it <name-or-id-of-container> sh

While running above command make sure - container is running.

and then use cron jobs to configure scheduled rm command on those log files.

Solution 3

This answer to "Removing Airflow Task logs" also fits your use case in Airflow 1.10.

Basically, you need to implement a custom log handler and configure Airflow logging to use that handler instead of the default (See UPDATING.md, not README nor docs!!, in Airflow source repo)

One word of caution: Due to the way logging, multiprocessing, and Airflow default handlers interact, it is safer to override handler methods than to extend them by calling super() in a derived handler class. Because Airflow default handlers don't use locks

Share:
15,304
Ryan Stack
Author by

Ryan Stack

Updated on June 06, 2022

Comments

  • Ryan Stack
    Ryan Stack almost 2 years

    I am using Docker Apache airflow VERSION 1.9.0-2 (https://github.com/puckel/docker-airflow).

    The scheduler produces a significant amount of logs, and the filesystem will quickly run out of space, so I am trying to programmatically delete the scheduler logs created by airflow, found in the scheduler container in (/usr/local/airflow/logs/scheduler)

    I have all of these maintenance tasks set up: https://github.com/teamclairvoyant/airflow-maintenance-dags

    However, these tasks only delete logs on the worker, and the scheduler logs are in the scheduler container.

    I have also setup remote logging, sending logs to S3, but as mentioned in this SO post Removing Airflow task logs this setup does not stop airflow from writing to the local machine.

    Additionally, I have also tried creating a shared named volume between the worker and the scheduler, as outlined here Docker Compose - Share named volume between multiple containers. However, I get the following error in worker:

    ValueError: Unable to configure handler 'file.processor': [Errno 13] Permission denied: '/usr/local/airflow/logs/scheduler'

    and the following error in scheduler:

    ValueError: Unable to configure handler 'file.processor': [Errno 13] Permission denied: '/usr/local/airflow/logs/scheduler/2018-04-11'

    And so, how do people delete scheduler logs??