Apache Airflow DAG cannot import local module

31,232

Solution 1

Adding the sys path again worked for me,

import sys
sys.path.insert(0,os.path.abspath(os.path.dirname(__file__)))

Solution 2

Are you using Airflow 1.9.0? This might be fixed there.

The issue is caused by the way Airflow loads DAGs: it doesn't just import them as normal python modules, because it want's to be able to reload it without restarting processes. As a result . isn't in the python search path.

If 1.9.0 doesn't fix this, the easiest change is to put export PYTHONPATH=/home/airflow/airflow/:$PYTHONPATH in the startup scripts. The exact format of that will depend on what you are using (systemd vs init scripts etc.)

Solution 3

If you're working with git-sync and did not use at as an initContainer (only as a container or not at all) in kubernetes, then it is possible that the modules were not loaded into the webserver or scheduler.

Share:
31,232
fildred13
Author by

fildred13

I am a professional software engineer with experience from the top of the stack to the bottom. My primary language is Python, though I've dabbled in many others.

Updated on October 01, 2021

Comments

  • fildred13
    fildred13 over 2 years

    I do not seem to understand how to import modules into an apache airflow DAG definition file. I would want to do this to be able to create a library which makes declaring tasks with similar settings less verbose, for instance.

    Here is the simplest example I can think of that replicates the issue: I modified the airflow tutorial (https://airflow.apache.org/tutorial.html#recap) to simply import a module and run a definition from that module. Like so:

    Directory structure:

    - dags/
    -- __init__.py
    -- lib.py
    -- tutorial.py
    

    tutorial.py:

    """
    Code that goes along with the Airflow located at:
    http://airflow.readthedocs.org/en/latest/tutorial.html
    """
    from airflow import DAG
    from airflow.operators.bash_operator import BashOperator
    from datetime import datetime, timedelta
    
    # Here is my added import
    from lib import print_double
    
    # And my usage of the imported def
    print_double(2)
    
    ## -- snip, because this is just the tutorial code, 
    ## i.e., some standard DAG defintion stuff --
    

    print_double is just a simple def which multiplies whatever input you give it by 2, and prints the result, but obviously that doesn't even matter because this is an import issue.

    I am able to run airflow test tutorial print_date 2015-06-01 as per the tutorial docs successfully - the dag runs, and moreover the print_double succeeds. 4 is printed to the console, as expected. All appears well.

    Then I go the web UI, and am greeted by Broken DAG: [/home/airflow/airflow/dags/tutorial.py] No module named 'lib'. Unpausing the dag and attempting a manual run using the UI causes a "running" status, but it never succeeds or fails. It just sits on "running" forever. I can queue up as many as I'd like, but they'll all just sit on "running" status.

    I've checked the airflow logs, and don't see any useful debug information there.

    So what am I missing?