Difference between stream processing and message processing

36,012

Solution 1

In traditional message processing, you apply simple computations on the messages -- in most cases individually per message.

In stream processing, you apply complex operations on multiple input streams and multiple records (ie, messages) at the same time (like aggregations and joins).

Furthermore, traditional messaging systems cannot go "back in time" -- ie, they automatically delete messages after they got delivered to all subscribed consumers. In contrast, Kafka keeps the messages as it uses a pull-based model (ie, consumers pull data out of Kafka) for a configurable amount of time. This allows consumers to "rewind" and consume messages multiple times -- or if you add a new consumer, it can read the complete history. This makes stream processing possible, because it allows for more complex applications. Furthermore, stream processing is not necessarily about real-time processing -- it's about processing infinite input streams (in contrast to batch processing, which is applied to finite inputs).

And Kafka offers Kafka Connect and Streams API -- so it is a stream-processing platform and not just a messaging/pub-sub system (even if it uses this in its core).

Solution 2

If you like splitting hairs: Messaging is communication between two or more processes or components whereas streaming is the passing of event log as they occur. Messages carry raw data whereas events contain information about the occurrence of and activity such as an order. So Kafka does both, messaging and streaming. A topic in Kafka can be raw messages or and event log that is normally retained for hours or days. Events can further be aggregated to more complex events.

Solution 3

Basically Kafka is messaging framework similar to ActiveMQ or RabbitMQ. There are some effort to take Kafka towards streaming is made by Confluent.

https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/

Then why Kafka comes into picture when talking about Stream processing?

Stream processing framework differs with input of data.In Batch processing,you have some files stored in file system and you want to continuously process that and store in some database. While in stream processing frameworks like Spark, Storm, etc will get continuous input from some sensor devices, api feed and kafka is used there to feed the streaming engine.

Solution 4

Although Rabbit supports streaming, it was actually not built for it(see Rabbit´s web site) Rabbit is a Message broker and Kafka is a event streaming platform.

Kafka can handle a huge number of 'messages' towards Rabbit. Kafka is a log while Rabbit is a queue which means that if once consumed, Rabbit´s messages are not there anymore in case you need it.

However Rabbit can specify message priorities but Kafka doesn´t.

It depends on your needs.

Solution 5

Message Processing implies operations on and/or using individual messages. Stream Processing encompasses operations on and/or using individual messages as well as operations on collection of messages as they flow into the system. For e.g., let's say transactions are coming in for a payment instrument - stream processing can be used to continuously compute hourly average spend. In this case - a sliding window can be imposed on the stream which picks up messages within the hour and computes average on the amount. Such figures can then be used as inputs to fraud detection systems

Share:
36,012
TechEnthusiast
Author by

TechEnthusiast

Updated on July 08, 2022

Comments

  • TechEnthusiast
    TechEnthusiast almost 2 years

    What is the basic difference between stream processing and traditional message processing? As people say that kafka is good choice for stream processing but essentially kafka is a messaging framework similar to ActivMQ, RabbitMQ etc.

    Why do we generally not say that ActiveMQ is good for stream processing as well.

    Is it the speed at which messages are consumed by the consumer determines if it is a stream?

  • miguno
    miguno over 7 years
    Apache Kafka describes itself as a "distributed streaming platform", see kafka.apache.org. Back in 2012 it started, roughly speaking, as a messaging system, but nowadays it's much more than that. Kafka particularly ships with its own stream processing engine, called Kafka's Streams API (or Kafka Streams in short). You are no longer required to use a separate stream processing technology such as Storm or Spark, and quite a few former users of Spark/Storm/... have begun migrating to Kafka's Streams API to simplify and improve their data architectures.
  • Davos
    Davos over 6 years
    Also, the input stream might be infinite, but the processing is more like a sliding window of finite input. In that sense there isn't really any difference between stream and batch processing. Batch processing is just a special case of stream processing where the windows are strongly defined.
  • Sheel Pancholi
    Sheel Pancholi about 5 years
    Very well put! Just one thought though, the word "streaming" these days is interchangeably (confused) used with "microbatching". The moment one talks about sliding windows one is already talking about microbatching. Streaming in the strictest sense is processing the record/event/fact as it comes. In that sense, the sliding window would be of size 1 in case of streaming.
  • Matthias J. Sax
    Matthias J. Sax about 5 years
    Micro batching limits how you can define window boundaries (ie, a hopping window that advanced at least one batch) while in stream processing you can advance a window in any granularity you like. Also, there is a difference between sliding and hopping windows (and many system use the term "sliding windows" to actually describe a hopping window, what can lead to confusion, too). Thus, I don't see why windowing implies micro-batching. Following your arguments, you could not aggregate over a stream, what is certainly possible in stream processing.
  • Matthias J. Sax
    Matthias J. Sax about 5 years
    Micro-batching is more about when to execute a computation and also (as mentioned) leaks implementation details (ie, batch boundaries) into the operator semantics. Stream processing does not do this.