Mule batch processing vs foreach vs splitter-aggregator

10,876

Solution 1

When you write "quite many" I assume it's too much for main memory, this rules out spliter/aggregator because it has to collect all records to return them as a list.

I assume you have your records in a stream or iterator, otherwise you probably have a memory problem...

So when to use for-each and when to use batch?

For Each

The most simple solution, but it has some drawbacks:

  1. It is single threaded (so may be too slow for your use case)
  2. It is "fire and forget": You can't collect anything within the loop, e.g. a record count
  3. There is not support handling "broken" records

Within the loop, you can have several steps (message processors) to process your records (e.g. for the mentioned database lookup).

May be a drawback, may be an advantage: The loop is synchronous. (If you want to process asynchronous, wrap it in an async-scope.)

Batch

A little more stuff to do / to understand, but more features:

  1. When called from a flow, always asynchronous (this may be a drawback).
  2. Can be standalone (e.g. with a poll inside for starting)
  3. When the data generated in the loading phase is too big, it is automatically offloaded to disk.
  4. Multithreading for free (number of threads configurable)
  5. Handling for "broken records": Batch steps may be executed for good/broken records only.
  6. You get statitstics at the end (number of records, number of successful records etc.)

So it looks like you better use batch.

Solution 2

For Splitter and Aggregator , you are responsible for writing the splitting logic and then joining them back at the end of processing. It is useful when you want to process records asynchronously using different server. It is less reliable compared to other option, here parallel processing is possible.

Foreach is more reliable but it process records iteratively using single thread ( synchronous), hence parallel processing is not possible. Each records creates a single message by default.

Batch processing is designed to process millions of records in a very fast and reliable way. By default 16 threads will process your records and it is reliable as well.

Please go through the link below for more details.

https://docs.mulesoft.com/mule-user-guide/v/3.8/splitter-flow-control-reference

https://docs.mulesoft.com/mule-user-guide/v/3.8/foreach

Share:
10,876
mcvkr
Author by

mcvkr

Software engineer, researcher

Updated on June 04, 2022

Comments

  • mcvkr
    mcvkr about 2 years

    In Mule, I have quite many records to process, where processing includes some calculations, going back and forth to database etc.. We can process collections of records with these options

    1. Batch processing

    2. ForEach

    3. Splitter-Aggregator

      So what are the main differences between them? When should we prefer one to others?

    Mule batch processing option does not seem to have batch job scope variable definition, for example. Or, what if I want to benefit multithreading to fasten the overall task? Or, which is better if I want to modify the payload during processing?

  • mcvkr
    mcvkr about 7 years
    Which is better if I want to modify the payload during processing?
  • Tushar Koley
    Tushar Koley about 7 years
    You can do that in every approach, if you have huge records and want faster performance then batch would be best.
  • PeterX
    PeterX almost 7 years
    Possibly worth pointing-out that Batch Processing requires the Enterprise runtime.
  • Mesh
    Mesh over 6 years
    You can call another flow within a For Each and have multithreading