Retry Celery tasks with exponential back off

python celery django-celery

34,992

Solution 1

The task.request.retries attribute contains the number of tries so far, so you can use this to implement exponential back-off:

from celery.task import task

@task(bind=True, max_retries=3)
def update_status(self, auth, status):
    try:
        Twitter(auth).update_status(status)
    except Twitter.WhaleFail as exc:
        raise self.retry(exc=exc, countdown=2 ** self.request.retries)

To prevent a Thundering Herd Problem, you may consider adding a random jitter to your exponential backoff:

import random
self.retry(exc=exc, countdown=int(random.uniform(2, 4) ** self.request.retries))

Solution 2

As of Celery 4.2 you can configure your tasks to use an exponential backoff automatically: http://docs.celeryproject.org/en/master/userguide/tasks.html#automatic-retry-for-known-exceptions

@app.task(autoretry_for=(Exception,), retry_backoff=2)
def add(x, y):
    ...

(This was already in the docs for Celery 4.1 but actually wasn't released then, see merge request)

Solution 3

FYI, celery has a util function to calculate exponential backoff time with jitter here, so you don't need to write your own.

34,992

Oleksandr

Quintessential learner and programming newbie..

Updated on July 08, 2022

Comments

Oleksandr almost 2 years
For a task like this:
```
from celery.decorators import task

@task()
def add(x, y):
    if not x or not y:
        raise Exception("test error")
    return self.wait_until_server_responds(
```
if it throws an exception and I want to retry it from the daemon side, how can apply an exponential back off algorithm, i.e. after 2^2, 2^3,2^4 etc seconds?

Also is the retry maintained from the server side, such that if the worker happens to get killed then next worker that spawns will take the retry task?
Oleksandr about 12 years

Do you know if this is a server side retry or the client is kept to wait? If the client is kept to wait then it’s bad.
idanzalz over 11 years

As far as I know the countdown attribute sets an eta for the task at the MQ backend (e.g. RabbitMQ). So it is not set on the client side.
asksol over 11 years

client is not kept to wait unless you do result.get() which is an explicit request to wait for the result to be ready, but there's also a timeout argument and there's a RETRY state so you can check if the task is being retried (and what the reason for the retry was)
robbyt over 10 years

For celery 3.1, you should use @task(bind=True) and celery will pass self into the function as the first argument, so you would change the args to be def update_status(self, auth, status): which then gives you access to self.retries
tutuDajuju almost 9 years

thanks @robbyt ! just a small correction - retries is an attribute of request, so self.request.retries is the proper call.
Matt over 6 years

Can you add an example how to call the task with @task(bind=True), Normally I would just, from tasks.py import update_status; update_status(auth, status) but what should I pass in for self?
kororo about 6 years

Nice catch, scratching my heads in 4.1.0, why my parameter of "retry_backoff" not respected.
rdrey almost 6 years

@kororo it doesn't seem to work with self.retry, only other exception types
qwertysmack over 3 years

With this approach you also benefit from the built in retry_jitter (defaulted to True) which avoids the Thundering Herd Problem mentioned in asksol's answer
jorf.brunning almost 3 years

This is the correct answer given that it is built-in, and does not require manually handling countdown
jorf.brunning almost 3 years

Please see other answer to this question for a built-in answer instead: stackoverflow.com/a/46467851/9190640
Sarang over 2 years

Does this also work when retry() is called? It doesn't seem to work for non-automatic retries (on Celery 4.2.2 at least). Anyone has any idea?
dKen about 2 years

In the future, avoid link-only answers, as links tend to go stale over time. Best to also include a code snippit and explanation in your answer for maximum upvotes and value-add. Edit: case in point, this answer's link is already broken stackoverflow.com/a/46467851/366529