“Combiner" Class in a mapreduce job

11,098

Solution 1

Assume you have a list of numbers, 1 2 3 4 5 6.

Associative here means you can take your operation and apply it to any subgroup, then apply it to the result of those and get the same answer:

(1) + (2 + 3) + (4 + 5 + 6)
  ==
(1 + 2) + (3 + 4) + (5) + (6)
  ==
...

Think of the parenthesis here as the execution of a combiner.

Commutative means that the order doesn't matter, so:

1 + 2 + 3 + 4 + 5 + 6
  ==
2 + 4 + 6 + 1 + 2 + 3
  ==
...

For example, addition, fits this property, as seen before. "Maximum" fits this property above as well, because the max of maxs is the max. max(a,b) == max(b,a).

Median is an example that doesn't work: the median of medians is not the true median.


Don't forget another important property of a combiner: the input types for the key/value and the output types of the key/value need to be the same. For example, you can't take in a string:int and return a string:float.

Often times, the reducer might output some sort of string instead of numerical value, which may prevent you from just plugging in your reducer as the combiner.

Solution 2

For commutativity, let's say your reducer can be represented by a function (in the mathematical term) called f(). Then your reducer is commutative if f(a, b) = f(b, a) For instance:

  • sum(A, B) is the same as sum(B, A)
  • xor(A, B) is the same as xor(B, A)
  • concat(A, B) is not the same as concat(B, A)

For associativity, the property is that f(f(a, b), c) = f(a, f(b, c)). For example:

  • (A + B) + C is the same as A + (B + C)
  • (A - B) - C is not the same as A - (B - C)

So in the context of Map/Reduce, your reducer has to respect these 2 properties. For example, if your reducer is doing just a sum(), or a max(), it respects both properties, but something like mean() or median() does not, and thus you can not use it as a combiner.

I personally see combiners as mini-reducers that run in memory after the map phase as an optimization to reduce network traffic, and the commutativity/associativity actually makes sense if you see a Map/Reduce this way:

enter image description here

Share:
11,098

Related videos on Youtube

wayen wan
Author by

wayen wan

Updated on June 04, 2022

Comments

  • wayen wan
    wayen wan almost 2 years

    A Combiner runs after the Mapper and before the Reducer,it will receive as input all data emitted by the Mapper instances on a given node. then emits output to the Reducers.

    And also,If a reduce function is both commutative and associative, then it can be used as a Combiner.

    My Question is what does the phrase "commutative and associative" mean in this situation?

  • Donald Miner
    Donald Miner about 12 years
    Can anyone venture a guess on the reason for the down vote? I'd really like to know if my answer isn't good for some reason, as this is how I explain combiners to people all the time. Thanks!