Locks and batch fetch messages with RabbitMq

15,559

Solution 1

Have you had a look at this free online book on Enterprise Integration Patterns?

It sounds like you really need a workflow where you have a batcher component before the messages get to your workers. With RabbitMQ there are two ways to do that. Either use an exchange type (and message format) that can do the batching for you, or have one queue, and a worker that sorts out batches and places each batch on its own queue. The batcher should probably also send a "batch ready" message to a control queue so that a worker can discover the existence of the new batch queue. Once the batch is processed the worker could delete the batch queue.

If you have control over the message format, you might be able to get RabbitMQ to do the batching implicitly in a couple of ways. With a topic exchange, you could make sure that the routing key on each message is of the format work.batchid.something and then a worker that learns of the existence of batch xxyzz would use a binding key like #.xxyzz.# to only consume those messages. No republishing needed.

The other way is to include a batch id in a header and use the newer headers exchange type. Of course you can also implement your own custom exchange types if you are willing to write a small amount of Erlang code.

I do recommend checking the book though, because it gives a better overview of messaging architecture than the typical worker queue concept that most people start with.

Solution 2

Have your consumers pull from just one queue. They will be guaranteed not to share messages (Rabbit will round-robin the messages among the currently-connected consumers) and it's heavily optimized for that exact usage pattern.

It's ready-to-use, out of the box. In the RabbitMQ docs it's called the Work Queue model. One queue, multiple consumers, with none of them sharing anything. It sounds like what you need.

Share:
15,559
Cristian Vrabie
Author by

Cristian Vrabie

Updated on July 19, 2022

Comments

  • Cristian Vrabie
    Cristian Vrabie almost 2 years

    I'm trying to use RabbitMq in a more unconventional way (though at this point i can pick any other message queue implementation if needed). Instead of leaving Rabbit push messages to my consumers, the consumer connects to a queue and fetches a batch of N messages (during which it consumes some and possible rejects some), after which it jumps to another queue and so on. This is done for redundancy. If some consumers crash all messages are guaranteed to be consumed by some other consumer.

    The problem is that I have multiple consumers and I don't want them to compete over the same queue. Is there a way to guarantee a lock on a queue? If not, can I at least make sure that if 2 consumers are connected to the same queue they don't read the same message? Transactions might help me to some degree but I've heard talk that they'll get removed from RabbitMQ.

    Other architectural suggestions are welcomed too.

    Thanks!

    EDIT: As pointed in the comment there's an a particularity in how I need to process the messages. They only make sense taken in groups and there's a high probability that related messages are clumped together in a queue. If for example I pull a batch of 100 messages, there's a high probability that I'll be able to do something with messages 1-3, 4-5,6-10 etc. If I fail to find a group for some messages I'll resubmit them to the queue. WorkQueue wouldn't work because it would spread messages from the same group to multiple workers that wouldn't know what to do with them.

  • Cristian Vrabie
    Cristian Vrabie over 12 years
    Unfortunately no because of a particularity in how I need to process the messages. They only make sense taken in groups and there's a high probability that related messages are clumped together in a queue. If for example I pull a batch of 100 messages, there's a high probability that I'll be able to do something with messages 1-3, 4-5,6-10 etc. If I fail to find a group for some messages I'll resubmit them to the queue. WorkQueue wouldn't work because it would spread messages from the same group to multiple workers that wouldn't know what to do with them.
  • Brian Kelly
    Brian Kelly over 12 years
    That's a pretty important requirement, and you should add it to the question too so that others see it above. Why don't you clump your messages together at the producer side, since it sounds like those groups are logically related and should be taken as an atomic unit rather than split up into lots of disparate messages? That would allow you to move easily between broker technologies, too, as I'm unaware of any broker that will give you what you need out of the box.
  • Cristian Vrabie
    Cristian Vrabie over 12 years
    You're right. I pointed the requirement above. Regarding your question, I would love to be able to clump them at the producer side, but each message in a group might come in via different producers that live on different machines.
  • Cristian Vrabie
    Cristian Vrabie over 12 years
    In the end I do realize that this is better suited for a relational database which would allow me to do the grouping and locking. The reason for which I was exploring the queues is because the messages live for a very short time (usually between the time when the message arrives in the queue and one worker consumes it is less than 1s) and there's a high traffic of messages involved. In a RDB this would work fine in the beginning until fragmentation would take its toll and kill performance.
  • Cristian Vrabie
    Cristian Vrabie over 12 years
    This sound just like what I want. Thans!
  • java_geek
    java_geek over 9 years
    You should consider using prefetch.count on the channel to pull messages in batches. rabbitmq.com/consumer-prefetch.html