Why do we need message brokers like RabbitMQ over a database like PostgreSQL?

49,921

Solution 1

Rabbit's queues reside in memory and will therefore be much faster than implementing this in a database. A (good)dedicated message queue should also provide essential queuing related features such as throttling/flow control, and the ability to choose different routing algorithms, to name a couple(rabbit provides these and more). Depending on the size of your project, you may also want the message passing component separate from your database, so that if one component experiences heavy load, it need not hinder the other's operation.

As for the problems you mentioned:

  • polling keeping the database busy and low performing: Using Rabbitmq, producers can push updates to consumers which is far more performant than polling. Data is simply sent to the consumer when it needs to be, eliminating the need for wasteful checks.

  • locking of the table -> again low performing: There is no table to lock :P

  • millions of rows of task -> again polling is low performing: As mentioned above, Rabbitmq will operate faster as it resides RAM, and provides flow control. If needed, it can also use the disk to temporarily store messages if it runs out of RAM. After 2.0, Rabbit has significantly improved on its RAM usage. Clustering options are also available.

In regards to AMQP, I would say a really cool feature is the "exchange", and the ability for it to route to other exchanges. This gives you more flexibility and enables you to create a wide array of elaborate routing typologies which can come in very handy when scaling. For a good example, see:


(source: springsource.com)

and: http://blog.springsource.org/2011/04/01/routing-topologies-for-performance-and-scalability-with-rabbitmq/

Finally, in regards to Redis, yes, it can be used as a message broker, and can do well. However, Rabbitmq has more message queuing features than Redis, as rabbitmq was built from the ground up to be a full-featured enterprise-level dedicated message queue. Redis on the other hand was primarily created to be an in-memory key-value store(though it does much more than that now; its even referred to as a swiss army knife). Still, I've read/heard many people achieving good results with Redis for smaller sized projects, but haven't heard much about it in larger applications.

Here is an example of Redis being used in a long-polling chat implementation: http://eflorenzano.com/blog/2011/02/16/technology-behind-convore/

Solution 2

PostgreSQL 9.5

PostgreSQL 9.5 incorporates SELECT ... FOR UPDATE ... SKIP LOCKED. This makes implementing working queuing systems a lot simpler and easier. You may no longer require an external queueing system since it's now simple to fetch 'n' rows that no other session has locked, and keep them locked until you commit confirmation that the work is done. It even works with two-phase transactions for when external co-ordination is required.

External queueing systems remain useful, providing canned functionality, proven performance, integration with other systems, options for horizontal scaling and federation, etc. Nonetheless, for simple cases you don't really need them anymore.

Older versions

You don't need such tools, but using one may make life easier. Doing queueing in the database looks easy, but you'll discover in practice that high performance, reliable concurrent queuing is really hard to do right in a relational database.

That's why tools like PGQ exist.

You can get rid of polling in PostgreSQL by using LISTEN and NOTIFY, but that won't solve the problem of reliably handing out entries off the top of the queue to exactly one consumer while preserving highly concurrent operation and not blocking inserts. All the simple and obvious solutions you think will solve that problem actually don't in the real world, and tend to degenerate into less efficient versions of single-worker queue fetching.

If you don't need highly concurrent multi-worker queue fetches then using a single queue table in PostgreSQL is entirely reasonable.

Share:
49,921

Related videos on Youtube

Yugal Jindle
Author by

Yugal Jindle

Everybody is a genius. But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid. -- Anonymous Github : YugalJindle Twitter : @YugalJindle Google+ : +YugalJindle LinkedIn : http://www.linkedin.com/in/YugalJindle

Updated on July 08, 2022

Comments

  • Yugal Jindle
    Yugal Jindle almost 2 years

    I am new to message brokers like RabbitMQ which we can use to create tasks / message queues for a scheduling system like Celery.

    Now, here is the question:

    • I can create a table in PostgreSQL which can be appended with new tasks and consumed by the consumer program like Celery.

    • Why on earth would I want to setup a whole new tech for this like RabbitMQ?

    Now, I believe scaling cannot be the answer since our database like PostgreSQL can work in a distributed environment.

    I googled for what problems does the database poses for the particular problem, and I found:

    • polling keeps the database busy and low performing
    • locking of the table -> again low performing
    • millions of rows of tasks -> again, polling is low performing

    Now, how does RabbitMQ or any other message broker like that solves these problems?

    Also, I found out that AMQP protocol is what it follows. What's great in that?

    Can Redis also be used as a message broker? I find it more analogous to Memcached than RabbitMQ.

    Please shed some light on this!

    • CadentOrange
      CadentOrange over 10 years
      The impact of locking should be a lot less with PostgreSQL because it implements MVCC where readers are not blocked by writers and vice versa. Most of the articles I've found criticising the use of databases as message queues have MySQL in mind.
    • theMayer
      theMayer almost 8 years
      A message broker moves data between nodes, while a database keeps data in one place. The fact that you can access data in a database from multiple nodes does not, on its face, make it a good tool to transfer data quickly between nodes.
    • Mark K Cowan
      Mark K Cowan almost 8 years
      "scheduling system like celery" — I just learnt something which will be useful in my design, from the question. Now to read the answers...
    • giorgi dvalishvili
      giorgi dvalishvili about 6 years
      using message broker producer and consumer is decoupled.
    • Md. Sajedul Karim
      Md. Sajedul Karim almost 6 years
      You can view bellow link. It has a wide description: stackoverflow.com/a/51377756/3073945
  • Joachim Sauer
    Joachim Sauer over 11 years
    I've implemented a JMS implementation (i.e. a message passing system) on top of a database. I can tell you that it is possible, but it's not fun and it doesn't usually pay off to do it. Some of the problems you mention can be worked around, but it does increase the complexity quite a lot. All in all I agree: use a dedicated MQ system, if you need one. For low workloads, you can get away with having it in the DB, however.
  • Yugal Jindle
    Yugal Jindle over 11 years
    the line reliably handing out entries off the top of the queue to exactly one consumer while preserving highly concurrent operation and not blocking inserts. summarizes it - Right ?
  • Yugal Jindle
    Yugal Jindle over 11 years
    You simply covered all concerns / doubts. Awesome answer !
  • Mahn
    Mahn over 11 years
    That's interesting. What about consistency by the way? What if there are hundreds of jobs on a queue and the node holding them in ram crashes?
  • Jaigus
    Jaigus over 11 years
    @Mahn There are quite a few options available by way of redundancy, and saving some data to disk: rabbitmq.com/ha.html I feel the best approach is to use a combination of the features offered.
  • jkj
    jkj over 11 years
    Actually, with PostgreSQL there is no polling (see NOTIFY) nor are there table locks (see MVCC). Though PostgreSQL is still not designed for message queuing, it is not completely unsuitable.
  • Joe
    Joe over 10 years
    Like what @jkj said, there's NOTIFY and no tables locks. The only issue seems like the high bandwidth of messages. Couldn't you have a dedicated PostgreSQL instance instead of maintaining an entirely new system like Rabbit? You can 1) use a single PostgreSQL instance until you reach a bottleneck, then 2) use a dedicated Postgres, then finally 3) easily switch to Rabbit as your broker. Seems like starting with Rabbit is pre-optimizing.
  • Alexandre
    Alexandre over 7 years
    RabbitMQ support lot of protocol and it make it easy to interact between different kind of device: embedded system, server, mobile... (as you mention it provide lot of feature for messaging)
  • Ralph Bolton
    Ralph Bolton over 5 years
    Just to add, some message brokers also allow messages to be transformed in some way before being passed on. That is, you can augment the message with additional information before passing it on from the broker to a client. This can be helpful with legacy or tricky-to-change apps working alongside current apps, in so much as the broker can make messages from them look the same to consumers.
  • Tim Barrass
    Tim Barrass over 4 years
    "Rabbit's queues reside in memory and will therefore be much faster than implementing this in a database." <- is this generally true, if messages stay in cache in the DB?
  • duality_
    duality_ over 3 years
    Does Celery use this feature of Postgres? If it doesn't, then this doesn't help.
  • Craig Ringer
    Craig Ringer over 3 years
    @duality_ If it doesn't, write a patch :)
  • Luk Aron
    Luk Aron over 3 years
    Postgresql is atomic right? So there is a invisible lock for concurrent write right?
  • Luk Aron
    Luk Aron over 3 years