Is it possible to ensure unique messages are in a rabbitmq queue?

19,002

Solution 1

The core problem seems to be this:

"...its possible that a piece of data is broken down into a part that's 
duplicated in the queue and the consumers continue to process it and 
end up in a infinite loop."

You can focus on uniqueness of your queued items all you want, but the issue above is where you should focus your efforts, IMO. One way to prevent infinite looping might be to have a "visited" bit in your message payload that is set by consumers before they re-queue the broken-down item.

Another option would be to have the consumers re-queue back to a special queue that is treated slightly differently to prevent infinite looping. Either way, you should attack the issue by dealing with it as a core part of your application's strategy rather than using a feature of a messaging system to step around it.

Solution 2

I think even if you could fix the issue of not sending duplicates to the queue, you will sooner or later hit this issue:

From RabbitMQ Documentation: "Recovery from failure: in the event that a client is disconnected from the broker owing to failure of the node to which the client was connected, if the client was a publishing client, it's possible for the broker to have accepted and passed on messages from the client without the client having received confirmation for them; and likewise on the consuming side it's possible for the client to have issued acknowledgements for messages and have no idea whether or not those acknowledgements made it to the broker and were processed before the failure occurred. In short, you still need to make sure your consuming clients can identify and deal with duplicate messages."

Basically, it looks like this, you send a request to rabbitmq, rabbitmq replies with an ACK but for 1 reason or another, your consumer or producer does not receive this ACK. Rabbitmq has no way of knowing the ack was not received and your producer will end up re-sending the message, having never received an ack.

It is a pain to handle duplicate messages especially in apps where messaging is used as a kind of RPC, but it looks like this is unavoidable when using this kind of messaging architecture.

Solution 3

There's a plugin for rabbitmq that enables you to do this type of control with some additional headers.

You should enable the plugin and define x-deduplication-header on the message, with a hash or something that uniquely identifies the message sent, so when other message with the same header value gets into rabbitmq`s exchange it will not be routed to any queue.

See : https://github.com/noxdafox/rabbitmq-message-deduplication

Share:
19,002
Lostsoul
Author by

Lostsoul

Never stopped being a student.

Updated on July 18, 2022

Comments

  • Lostsoul
    Lostsoul almost 2 years

    Basically my consumers are producers as well. We get an initial dataset and it gets sent to the queue. A consumer takes an item and processes it, from that point there's 3 possibilities:

    1. Data is good and gets putting a 'good' queue for storage
    2. Data is bad and discarded
    3. Data is not good(yet) or bad(yet) so data is broken down into smaller parts and sent back to the queue for further processing.

    My problem is with step 3, because the queue grows very quickly at first its possible that a piece of data is broken down into a part thats duplicated in the queue and the consumers continue to process it and end up in a infinite loop.

    I think the way to prevent against this is to prevent duplicates from going into the queue. I can't do this on the client side because over the course of an hour I may have many cores dealing with billions of data points(to have each client scan it before submitting would slow me down too much). I think this needs to be done on the server side but, like I mentioned, the data is quite large and I don't know how to efficiently ensure no duplicates.

    I might be asking the impossible but thought I'd give it a shot. Any ideas would be greatly appreciated.

  • Lostsoul
    Lostsoul about 12 years
    I am trying to do exactly that(I think). By ensuring there are no duplicates of past items I'm ensuring that the same data is not processed more than once. I'm just sure of the implemention in rabbitmq, is there a way to simply send message id's and have rabbitmq discard duplicates or do I need to set a filter or something(if I do how does it work with rabbitmq).
  • Brian Kelly
    Brian Kelly about 12 years
    There's no way to do that, AFAIK. Rabbit doesn't care about the contents of your messages or what's already in your queues, so it would be up to your application to take care of this.
  • Lostsoul
    Lostsoul about 12 years
    So, if my message ID's are unique(hashcode of my actual data), I would need to store them in a DB or something and query against that(to see if msg ID has been sent before) before sending to rabbit? I've been thinking of that but it would require the client to do a few queries while my message server waits(I was trying to see if I could push this work to the message server itself)
  • Lostsoul
    Lostsoul about 12 years
    Another question, related to mine above, with rabbitmq can I trigger a process when something is sent to the queue so I can filter it or do I need to send it to another program to filter before its sent to the queue at all?
  • Brian Kelly
    Brian Kelly about 12 years
    You can do anything you want when a message arrives at the front of a queue - that's the very definition of a "message consumer" :) Also, I'd avoid adding a DB to the equation. Why not track the fact that the message has already been processed within the message itself? That avoids duplication, another system dependency and the creation of a single-point-of-failure.
  • Lostsoul
    Lostsoul about 12 years
    The problem is if I do this on the consumer side than the consumers need to be aware of everything thats gone into the queue. That means for my job I can't use high cpu instances but instead need to provision memory as well(also I'm not sure how to sync between multiple servers sending jobs as well). My rabbitmq server is dedicated to this task and has a ton of memory I could allocate to filtering, but your right I'm want to avoid DB, but not sure how(maybe java server socket that consumers send to, stores ID and if unique passes to rabbitmq? this is where I'm lost.)
  • Brian Kelly
    Brian Kelly about 12 years
    Why not just have an always-incrementing number added to each message? Before it gets re-queued the consumer dealing with it could just read and then increment the number in the payload as it pushes it back in the queue. Then, any consumers reading a message could reject any of them that have a number higher than a specific threshold (say, 10). It solves the infinite loop issue, requires no tracking of uniqueness and doesn't need any extra DB.
  • ankush981
    ankush981 over 2 years
    If I'm interacting with an external API as part of the business logic, then this strategy isn't very appealing. I mean, suppose I have ten instances in a distributed setting, and each issues a series of messages in a cron-like fashion to check, say, the status of some e-commerce orders, then we'll get 10 GET messages in the queue per order, overloading the external API. I think this is where having the queue reject messages with duplicate content/id will help greatly.