Celery Beat: Limit to single task instance at a time

15,225

Solution 1

The only way to do this is implementing a locking strategy yourself:

Read under the section here for the reference.

Like with cron, the tasks may overlap if the first task does not complete before the next. If that is a concern you should use a locking strategy to ensure only one instance can run at a time (see for example Ensuring a task is only executed one at a time).

Solution 2

from functools import wraps
from celery import shared_task


def skip_if_running(f):
    task_name = f'{f.__module__}.{f.__name__}'

    @wraps(f)
    def wrapped(self, *args, **kwargs):
        workers = self.app.control.inspect().active()

        for worker, tasks in workers.items():
            for task in tasks:
                if (task_name == task['name'] and
                        tuple(args) == tuple(task['args']) and
                        kwargs == task['kwargs'] and
                        self.request.id != task['id']):
                    print(f'task {task_name} ({args}, {kwargs}) is running on {worker}, skipping')

                    return None

        return f(self, *args, **kwargs)

    return wrapped


@shared_task(bind=True)
@skip_if_running
def test_single_task(self):
    pass


test_single_task.delay()

Solution 3

I solved the issue using celery-once which I extended to celery-one.

Both serve for your issue. It uses Redis to lock a running task. celery-one will also keep track of the task which is locking.

A very simple usage example for celery beat follows. In the code below, slow_task is scheduled every 1 second, but it's completion time is 5 seconds. Normal celery would schedule the task each second even if it is already running. celery-one would prevent this.

celery = Celery('test')
celery.conf.ONE_REDIS_URL = REDIS_URL
celery.conf.ONE_DEFAULT_TIMEOUT = 60 * 60
celery.conf.BROKER_URL = REDIS_URL
celery.conf.CELERY_RESULT_BACKEND = REDIS_URL

from datetime import timedelta

celery.conf.CELERYBEAT_SCHEDULE = {
    'add-every-30-seconds': {
        'task': 'tasks.slow_task',
        'schedule': timedelta(seconds=1),
        'args': (1,)
    },
}

celery.conf.CELERY_TIMEZONE = 'UTC'


@celery.task(base=QueueOne, one_options={'fail': False})
def slow_task(a):
    print("Running")
    sleep(5)
    return "Done " + str(a)

Solution 4

I took a crack at writing a decorator to use Postgres advisory locking similar to what erydo alluded to in his comment.

It's not very pretty, but seems to work correctly. This is with SQLAlchemy 0.9.7 under Python 2.7.

from functools import wraps
from sqlalchemy import select, func

from my_db_module import Session # SQLAlchemy ORM scoped_session

def pg_locked(key):
    def decorator(f):
        @wraps(f)
        def wrapped(*args, **kw):
            session = db.Session()
            try:
                acquired, = session.execute(select([func.pg_try_advisory_lock(key)])).fetchone()
                if acquired:
                    return f(*args, **kw)
            finally:
                if acquired:
                    session.execute(select([func.pg_advisory_unlock(key)]))
        return wrapped
    return decorator

@app.task
@pg_locked(0xdeadbeef)
def singleton_task():
    # only 1x this task can run at a time
    pass

(Would welcome any comments on ways to improve this!)

Solution 5

A distributed locking system is required, for those Celery beat instances are essentially different processes which might be across different hosts.

Central coordinate systems such as ZooKeeper and etcd is suitable for implementation of distributed locking system.

I recommend using etcd, which is lightweight and fast. There are several implementations of lock over etcd, such as:

python-etcd-lock

Share:
15,225
erydo
Author by

erydo

Updated on June 08, 2022

Comments

  • erydo
    erydo almost 2 years

    I have celery beat and celery (four workers) to do some processing steps in bulk. One of those tasks is roughly along the lines of, "for each X that hasn't had a Y created, create a Y."

    The task is run periodically at a semi-rapid rate (10sec). The task completes very quickly. There are other tasks going on as well.

    I've run into the issue multiple times in which the beat tasks apparently become backlogged, and so the same task (from different beat times) are executed simultaneously, causing incorrectly duplicated work. It also appears that the tasks are executed out-of-order.

    1. Is it possible to limit celery beat to ensure only one outstanding instance of a task at a time? Is setting something like rate_limit=5 on the task the "correct" way to of doing this?

    2. Is it possible to ensure that beat tasks are executed in-order, e.g. instead of dispatching a task, beat adds it to a task chain?

    3. What's the best way of handling this, short of making those tasks themselves execute atomically and are safe to be executed concurrently? That was not a restriction I would have expected of beat tasks…

    The task itself is defined naïvely:

    @periodic_task(run_every=timedelta(seconds=10))
    def add_y_to_xs():
        # Do things in a database
        return
    

    Here's an actual (cleaned) log:

    • [00:00.000] foocorp.tasks.add_y_to_xs sent. id->#1
    • [00:00.001] Received task: foocorp.tasks.add_y_to_xs[#1]
    • [00:10.009] foocorp.tasks.add_y_to_xs sent. id->#2
    • [00:20.024] foocorp.tasks.add_y_to_xs sent. id->#3
    • [00:26.747] Received task: foocorp.tasks.add_y_to_xs[#2]
    • [00:26.748] TaskPool: Apply #2
    • [00:26.752] Received task: foocorp.tasks.add_y_to_xs[#3]
    • [00:26.769] Task accepted: foocorp.tasks.add_y_to_xs[#2] pid:26528
    • [00:26.775] Task foocorp.tasks.add_y_to_xs[#2] succeeded in 0.0197986490093s: None
    • [00:26.806] TaskPool: Apply #1
    • [00:26.836] TaskPool: Apply #3
    • [01:30.020] Task accepted: foocorp.tasks.add_y_to_xs[#1] pid:26526
    • [01:30.053] Task accepted: foocorp.tasks.add_y_to_xs[#3] pid:26529
    • [01:30.055] foocorp.tasks.add_y_to_xs[#1]: Adding Y for X id #9725
    • [01:30.070] foocorp.tasks.add_y_to_xs[#3]: Adding Y for X id #9725
    • [01:30.074] Task foocorp.tasks.add_y_to_xs[#1] succeeded in 0.0594762689434s: None
    • [01:30.087] Task foocorp.tasks.add_y_to_xs[#3] succeeded in 0.0352867960464s: None

    We're currently using Celery 3.1.4 with RabbitMQ as the transport.

    EDIT Dan, here's what I came up with:

    Dan, here's what I ended up using:

    from sqlalchemy import func
    from sqlalchemy.exc import DBAPIError
    from contextlib import contextmanager
    
    
    def _psql_advisory_lock_blocking(conn, lock_id, shared, timeout):
        lock_fn = (func.pg_advisory_xact_lock_shared
                   if shared else
                   func.pg_advisory_xact_lock)
        if timeout:
            conn.execute(text('SET statement_timeout TO :timeout'),
                         timeout=timeout)
        try:
            conn.execute(select([lock_fn(lock_id)]))
        except DBAPIError:
            return False
        return True
    
    
    def _psql_advisory_lock_nonblocking(conn, lock_id, shared):
        lock_fn = (func.pg_try_advisory_xact_lock_shared
                   if shared else
                   func.pg_try_advisory_xact_lock)
        return conn.execute(select([lock_fn(lock_id)])).scalar()
    
    
    class DatabaseLockFailed(Exception):
        pass
    
    
    @contextmanager
    def db_lock(engine, name, shared=False, block=True, timeout=None):
        """
        Context manager which acquires a PSQL advisory transaction lock with a
        specified name.
        """
        lock_id = hash(name)
    
        with engine.begin() as conn, conn.begin():
            if block:
                locked = _psql_advisory_lock_blocking(conn, lock_id, shared,
                                                      timeout)
            else:
                locked = _psql_advisory_lock_nonblocking(conn, lock_id, shared)
            if not locked:
                raise DatabaseLockFailed()
            yield
    

    And the celery task decorator (used only for periodic tasks):

    from functools import wraps
    from preo.extensions import db
    
    
    def locked(name=None, block=True, timeout='1s'):
        """
        Using a PostgreSQL advisory transaction lock, only runs this task if the
        lock is available. Otherwise logs a message and returns `None`.
        """
        def with_task(fn):
            lock_id = name or 'celery:{}.{}'.format(fn.__module__, fn.__name__)
    
            @wraps(fn)
            def f(*args, **kwargs):
                try:
                    with db_lock(db.engine, name=lock_id, block=block,
                                 timeout=timeout):
                        return fn(*args, **kwargs)
                except DatabaseLockFailed:
                    logger.error('Failed to get lock.')
                    return None
            return f
        return with_task
    
  • erydo
    erydo over 10 years
    Thanks. I ended up creating a task decorator that guards the task with a PostgreSQL advisory lock. I went this route so that workers on separate machines can maintain a single synchronized lock point. (We're not currently using memcached as in the example).
  • Dan Lenski
    Dan Lenski almost 10 years
    @erydo, would you be willing to share that decorator? I'm trying to do nearly the same thing and running into some odd synchronization issues.
  • erydo
    erydo almost 10 years
    Dan, yep, that's essentially equivalent to what I came up with—I've added my solution as an edit. It includes support for non-blocking lock attempts and configurable timeouts, since I ended up using the locking code in another place as well.
  • ChrisF
    ChrisF over 8 years
    Please a) include the relevant parts of the answer here and b) declare your interest in the site. This was being flagged as spam so I deleted it for you.
  • BoltClock
    BoltClock over 8 years
    I've undeleted your answer and invalidated the spam flag. Thanks for following up.
  • erydo
    erydo over 8 years
    Distributed locking is a great suggestion, but not strictly required. Centralized locking is fine as long as everything is referring to the same center. In our case, these celery workers all interact with the same master Postgres database anyway.etcd, ZooKeeper, or consul are great suggestions here though.
  • daino3
    daino3 over 5 years
    what in the world is "clr_app" and how do I import it? Wow. I'm a dummy. It's just @app.task...
  • Devaroop
    Devaroop over 5 years
    For Django users with shared_task decorator: Use the above code with @shared_task(bind=true) and the function with self as first parameter def your_task(self, *args, **kwargs)
  • dtc
    dtc over 3 years
    i think you'll still have backlogged tasks if you say, had only one worker for this one task. Would need that extra worker to prevent the backlogging