Handling long running tasks in pika / RabbitMQ

30,080

Solution 1

For now, your best bet is to turn off heartbeats, this will keep RabbitMQ from closing the connection if you're blocking for too long. I am experimenting with pika's core connection management and IO loop running in a background thread but it's not stable enough to release.

In pika v1.1.0 this is ConnectionParameters(heartbeat=0)

Solution 2

Please don't disable heartbeats!

As of Pika 0.12.0, please use the technique described in this example code to run your long-running task on a separate thread and then acknowledge the message from that thread.


NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.

Solution 3

I encounter the same problem you had.
My solution is:

  1. ture off the heartbeat on the server side
  2. evaluate the maximum time the task can possible take
  3. set the client heartbeat timeout to the time got from step2

Why this?

As i test with the following cases:

case one
  1. server heartbeat turn on, 1800s
  2. client unset

I still get error when task running for a very long time -- >1800

case two
  1. turn off server heartbeat
  2. turn off client heartbeat

There is no error on client side, except one problem--when the client crashes(my os restart on some faults), the tcp connection still can be seen at the Rabbitmq Management plugin. And it is confusing.

case three
  1. turn off server heartbeat
  2. turn on client heartbeat, set it to the foresee maximum run time

In this case, i can dynamic change every heatbeat on indivitual client. In fact, i set heartbeat on the machines crashed frequently.Moreover, i can see offline machine through the Rabbitmq Manangement plugin.

Environment

OS: centos x86_64
pika: 0.9.13
rabbitmq: 3.3.1

Solution 4

  1. You can periodic call connection.process_data_events() in your long_running_task(connection), this function will send heartbeat to server when it is been called, and keep the pika client away from close.
  2. Set the heartbeat value greater than call connection.process_data_events() period in your pika BlockingConnection.

Solution 5

Don't disable heartbeat.
The best solution is to run the task in a separate thread and , set the prefetch_count to 1 so that the consumer only gets 1 unacknowledged message using something like this channel.basic_qos(prefetch_count=1)

Share:
30,080
jmacdonagh
Author by

jmacdonagh

Updated on June 25, 2021

Comments

  • jmacdonagh
    jmacdonagh almost 3 years

    We're trying to set up a basic directed queue system where a producer will generate several tasks and one or more consumers will grab a task at a time, process it, and acknowledge the message.

    The problem is, the processing can take 10-20 minutes, and we're not responding to messages at that time, causing the server to disconnect us.

    Here's some pseudo code for our consumer:

    #!/usr/bin/env python
    import pika
    import time
    
    connection = pika.BlockingConnection(pika.ConnectionParameters(
            host='localhost'))
    channel = connection.channel()
    
    channel.queue_declare(queue='task_queue', durable=True)
    print ' [*] Waiting for messages. To exit press CTRL+C'
    
    def callback(ch, method, properties, body):
        long_running_task(connection)
        ch.basic_ack(delivery_tag = method.delivery_tag)
    
    channel.basic_qos(prefetch_count=1)
    channel.basic_consume(callback,
                          queue='task_queue')
    
    channel.start_consuming()
    

    After the first task completes, an exception is thrown somewhere deep inside of BlockingConnection, complaining that the socket was reset. In addition, the RabbitMQ logs show that the consumer was disconnected for not responding in time (why it resets the connection rather than sending a FIN is strange, but we won't worry about that).

    We searched around a lot because we believed this was the normal use case for RabbitMQ (having a lot of long running tasks that should be split up among many consumers), but it seems like nobody else really had this issue. Finally we stumbled upon a thread where it was recommended to use heartbeats and to spawn the long_running_task() in a separate thread.

    So the code has become:

    #!/usr/bin/env python
    import pika
    import time
    import threading
    
    connection = pika.BlockingConnection(pika.ConnectionParameters(
            host='localhost',
            heartbeat_interval=20))
    channel = connection.channel()
    
    channel.queue_declare(queue='task_queue', durable=True)
    print ' [*] Waiting for messages. To exit press CTRL+C'
    
    def thread_func(ch, method, body):
        long_running_task(connection)
        ch.basic_ack(delivery_tag = method.delivery_tag)
    
    def callback(ch, method, properties, body):
        threading.Thread(target=thread_func, args=(ch, method, body)).start()
    
    channel.basic_qos(prefetch_count=1)
    channel.basic_consume(callback,
                          queue='task_queue')
    
    channel.start_consuming()
    

    And this seems to work, but it's very messy. Are we sure that the ch object is thread safe? In addition, imagine that long_running_task() is using that connection parameter to add a task to a new queue (i.e. the first part of this long process is done, let's send the task on to the second part). So, the thread is using the connection object. Is that thread safe?

    More to the point, what's the preferred way of doing this? I feel like this is very messy and possibly not thread safe, so maybe we're not doing it right. Thanks!

  • Justin Thomas
    Justin Thomas over 8 years
    how do you turn on client heartbeat? can't find anything about how to do it.
  • daniel_of_service
    daniel_of_service over 8 years
    You could try something like this: params = pika.ConnectionParameters(host=self.__host, port=self.__port, credentials=credentials, heartbeat_interval=<your-interval-in-seconds>)
  • Darshan Patel
    Darshan Patel about 8 years
    As @Gavin mentioned the best bet as of now is to turn off the heartbeat in pika while setting up the connection. connection = pika.BlockingConnection(pika.ConnectionParameters(host='loca‌​lhost', virtual_host='TestVirtualHost', credentials=credentials, heartbeat_interval=0, port=5672))
  • CharlesC
    CharlesC about 6 years
    I should have tried your approach first, saved me tons of headache and hairs. thank you for your helpful insight.
  • Druta Ruslan
    Druta Ruslan about 6 years
    connection.process_data_events() help me
  • Luke Bakken
    Luke Bakken over 5 years
    Pika 0.12.0 has a better solution, please see this answer
  • PLPeeters
    PLPeeters about 5 years
    Why is it a bad thing to disable heartbeats?
  • Luke Bakken
    Luke Bakken about 5 years
    Neither RabbitMQ nor your application will detect a lost TCP connection until the next operation is attempted on that connection.
  • PLPeeters
    PLPeeters about 5 years
    I get that, but depending on the use case, that's not necessarily a bad thing. In my case I'd rather have an error when trying to ack a message because the connection was lost, than having an error when trying to ack a message because the connection was closed due to my processing taking too long. That's why a general warning not to disable heartbeats seems unjustified IMO. It all depends on the use case, so I believe it would be more productive to say why you maybe should consider not disabling them instead of going "disabling = bad" without any further information.
  • DUDANF
    DUDANF almost 5 years
    @LukeBakken Is there any way to use this method with channel.basic_get? I need my consumer to consume one message, acknowledge it and then die/quit, I can get it to consume only one message with basic_get, but then I cannot get it to acknowledge the (long-running) message.
  • Prateek Kumar Dalbehera
    Prateek Kumar Dalbehera over 4 years
    Please refer to the solution provided by Luke Bakken. It's Thread Safe & refers to an official example from pika documentation.
  • Prateek Kumar Dalbehera
    Prateek Kumar Dalbehera over 4 years
    This is the best & correct solution. Thanks @LukeBakken.
  • varsh
    varsh almost 4 years
    If there are a large number of items queued up and the average time to process each one is very high, then the spawning an individual thread for processing a item would lead to an explosion of active threads which finally ends up in OOM error. I have experienced this.
  • xanjay
    xanjay over 3 years
    Thanks, It works. If its still not working for you. Note that heartbeat parameter should be set to both peers (consumer and producer). That's what happened in my case.
  • Zachary Vance
    Zachary Vance almost 3 years
    Stackoverflow answers should be self-contained. In addition to linking to example code, include the example code inline.
  • Lane
    Lane over 2 years
    Set ConnectionParameters(heartbeat=0) is safe. Because when you have killed this process, the connection is automatically closed immediately. You can go rabbit_mq_server:15672/#/connections to verify it.
  • bbg
    bbg about 2 years
    My long-running tasks were largely waiting/sleeping, so this helped a ton. I'm rather surprised how buried and non-intuitive "please tell the server I'm still alive" functionality is in pika, but glad to have finally found it.