Retry Lost or Failed Tasks (Celery, Django and RabbitMQ)

10,457

What you need is to set

CELERY_ACKS_LATE = True

Late ack means that the task messages will be acknowledged after the task has been executed, not just before, which is the default behavior. In this way if the worker crashes rabbit MQ will still have the message.

Obviously of a total crash (Rabbit + workers) at the same time there is no way of recovering the task, except if you implement a logging on task start and task end. Personally I write in a mongodb a line every time a task start and another one when the task finish (independently form the result), in this way I can know which task was interrupted by analyzing the mongo logs.

You can do it easily by overriding the methods __call__ and after_return of the celery base task class.

Following you see a piece of my code that uses a taskLogger class as context manager (with entry and exit point). The taskLogger class simply writes a line containing the task info in a mongodb instance.

def __call__(self, *args, **kwargs):
    """In celery task this function call the run method, here you can
    set some environment variable before the run of the task"""

    #Inizialize context managers    

    self.taskLogger = TaskLogger(args, kwargs)
    self.taskLogger.__enter__()

    return self.run(*args, **kwargs)

def after_return(self, status, retval, task_id, args, kwargs, einfo):
    #exit point for context managers
    self.taskLogger.__exit__(status, retval, task_id, args, kwargs, einfo)

I hope this could help

Share:
10,457
coding1223322
Author by

coding1223322

Updated on June 26, 2022

Comments

  • coding1223322
    coding1223322 about 2 years

    Is there a way to determine if any task is lost and retry it?


    I think that the reason for lost can be dispatcher bug or worker thread crash.

    I was planning to retry them but I'm not sure how to determine which tasks need to be retired?

    And how to make this process automatically? Can I use my own custom scheduler which will create new tasks?

    Edit: I found from the documentation that RabbitMQ never loose tasks, but what happens when worker thread crash in the middle of task execution?

  • y2k-shubham
    y2k-shubham almost 6 years
    2 doubts regarding CELERY_ACKS_LATE=True [1] how does (if at all) Celery ensure that same task is not picked up by multiple workers? [2] if Celery tasks should ideally be idempotent, then what's the problem with them running multiple times? (for the 2nd ques, actually here they say that its okay, but I'm looking for an explicit affirmative)
  • Mauro Rocco
    Mauro Rocco almost 6 years
    Even with ack_late the broker is aware message has been picked so it will never be picked from another worker.
  • dipesh
    dipesh over 4 years
    Where should I write the above mentioned code blocks in the project. Any idea?