Retry Celery tasks with exponential back off

34,992

Solution 1

The task.request.retries attribute contains the number of tries so far, so you can use this to implement exponential back-off:

from celery.task import task

@task(bind=True, max_retries=3)
def update_status(self, auth, status):
    try:
        Twitter(auth).update_status(status)
    except Twitter.WhaleFail as exc:
        raise self.retry(exc=exc, countdown=2 ** self.request.retries)

To prevent a Thundering Herd Problem, you may consider adding a random jitter to your exponential backoff:

import random
self.retry(exc=exc, countdown=int(random.uniform(2, 4) ** self.request.retries))

Solution 2

As of Celery 4.2 you can configure your tasks to use an exponential backoff automatically: http://docs.celeryproject.org/en/master/userguide/tasks.html#automatic-retry-for-known-exceptions

@app.task(autoretry_for=(Exception,), retry_backoff=2)
def add(x, y):
    ...

(This was already in the docs for Celery 4.1 but actually wasn't released then, see merge request)

Solution 3

FYI, celery has a util function to calculate exponential backoff time with jitter here, so you don't need to write your own.

Share:
34,992

Related videos on Youtube

Oleksandr
Author by

Oleksandr

Quintessential learner and programming newbie..

Updated on July 08, 2022

Comments

  • Oleksandr
    Oleksandr almost 2 years

    For a task like this:

    from celery.decorators import task
    
    @task()
    def add(x, y):
        if not x or not y:
            raise Exception("test error")
        return self.wait_until_server_responds(
    

    if it throws an exception and I want to retry it from the daemon side, how can apply an exponential back off algorithm, i.e. after 2^2, 2^3,2^4 etc seconds?

    Also is the retry maintained from the server side, such that if the worker happens to get killed then next worker that spawns will take the retry task?

  • Oleksandr
    Oleksandr about 12 years
    Do you know if this is a server side retry or the client is kept to wait? If the client is kept to wait then it’s bad.
  • idanzalz
    idanzalz over 11 years
    As far as I know the countdown attribute sets an eta for the task at the MQ backend (e.g. RabbitMQ). So it is not set on the client side.
  • asksol
    asksol over 11 years
    client is not kept to wait unless you do result.get() which is an explicit request to wait for the result to be ready, but there's also a timeout argument and there's a RETRY state so you can check if the task is being retried (and what the reason for the retry was)
  • robbyt
    robbyt over 10 years
    For celery 3.1, you should use @task(bind=True) and celery will pass self into the function as the first argument, so you would change the args to be def update_status(self, auth, status): which then gives you access to self.retries
  • tutuDajuju
    tutuDajuju almost 9 years
    thanks @robbyt ! just a small correction - retries is an attribute of request, so self.request.retries is the proper call.
  • Matt
    Matt over 6 years
    Can you add an example how to call the task with @task(bind=True), Normally I would just, from tasks.py import update_status; update_status(auth, status) but what should I pass in for self?
  • kororo
    kororo about 6 years
    Nice catch, scratching my heads in 4.1.0, why my parameter of "retry_backoff" not respected.
  • rdrey
    rdrey almost 6 years
    @kororo it doesn't seem to work with self.retry, only other exception types
  • qwertysmack
    qwertysmack over 3 years
    With this approach you also benefit from the built in retry_jitter (defaulted to True) which avoids the Thundering Herd Problem mentioned in asksol's answer
  • jorf.brunning
    jorf.brunning almost 3 years
    This is the correct answer given that it is built-in, and does not require manually handling countdown
  • jorf.brunning
    jorf.brunning almost 3 years
    Please see other answer to this question for a built-in answer instead: stackoverflow.com/a/46467851/9190640
  • Sarang
    Sarang over 2 years
    Does this also work when retry() is called? It doesn't seem to work for non-automatic retries (on Celery 4.2.2 at least). Anyone has any idea?
  • dKen
    dKen about 2 years
    In the future, avoid link-only answers, as links tend to go stale over time. Best to also include a code snippit and explanation in your answer for maximum upvotes and value-add. Edit: case in point, this answer's link is already broken stackoverflow.com/a/46467851/366529