Coordinating parallel execution in node.js

37,085

Solution 1

Nothing is truly parallel in node.js since it is single threaded. However, multiple events can be scheduled and run in a sequence you can't determine beforehand. And some things like database access are actually "parallel" in that the database queries themselves are run in separate threads but are re-integrated into the event stream when completed.

So, how do you schedule a callback on multiple event handlers? Well, this is one common technique used in animations in browser side javascript: use a variable to track the completion.

This sounds like a hack and it is, and it sounds potentially messy leaving a bunch of global variables around doing the tracking and in a lesser language it would be. But in javascript we can use closures:

function fork (async_calls, shared_callback) {
  var counter = async_calls.length;
  var callback = function () {
    counter --;
    if (counter == 0) {
      shared_callback()
    }
  }

  for (var i=0;i<async_calls.length;i++) {
    async_calls[i](callback);
  }
}

// usage:
fork([A,B,C],D);

In the example above we keep the code simple by assuming the async and callback functions require no arguments. You can of course modify the code to pass arguments to the async functions and have the callback function accumulate results and pass it to the shared_callback function.


Additional answer:

Actually, even as is, that fork() function can already pass arguments to the async functions using a closure:

fork([
  function(callback){ A(1,2,callback) },
  function(callback){ B(1,callback) },
  function(callback){ C(1,2,callback) }
],D);

the only thing left to do is to accumulate the results from A,B,C and pass them on to D.


Even more additional answer:

I couldn't resist. Kept thinking about this during breakfast. Here's an implementation of fork() that accumulates results (usually passed as arguments to the callback function):

function fork (async_calls, shared_callback) {
  var counter = async_calls.length;
  var all_results = [];
  function makeCallback (index) {
    return function () {
      counter --;
      var results = [];
      // we use the arguments object here because some callbacks 
      // in Node pass in multiple arguments as result.
      for (var i=0;i<arguments.length;i++) {
        results.push(arguments[i]);
      }
      all_results[index] = results;
      if (counter == 0) {
        shared_callback(all_results);
      }
    }
  }

  for (var i=0;i<async_calls.length;i++) {
    async_calls[i](makeCallback(i));
  }
}

That was easy enough. This makes fork() fairly general purpose and can be used to synchronize multiple non-homogeneous events.

Example usage in Node.js:

// Read 3 files in parallel and process them together:

function A (c){ fs.readFile('file1',c) };
function B (c){ fs.readFile('file2',c) };
function C (c){ fs.readFile('file3',c) };
function D (result) {
  file1data = result[0][1];
  file2data = result[1][1];
  file3data = result[2][1];

  // process the files together here
}

fork([A,B,C],D);

Update

This code was written before the existence of libraries like async.js or the various promise based libraries. I'd like to believe that async.js was inspired by this but I don't have any proof of it. Anyway.. if you're thinking of doing this today take a look at async.js or promises. Just consider the answer above a good explanation/illustration of how things like async.parallel work.

For completeness sake the following is how you'd do it with async.parallel:

var async = require('async');

async.parallel([A,B,C],D);

Note that async.parallel works exactly the same as the fork function we implemented above. The main difference is it passes an error as the first argument to D and the callback as the second argument as per node.js convention.

Using promises, we'd write it as follows:

// Assuming A, B & C return a promise instead of accepting a callback

Promise.all([A,B,C]).then(D);

Solution 2

I believe that now the "async" module provides this parallel functionality and is roughly the same as the fork function above.

Solution 3

The futures module has a submodule called join that I have liked to use:

Joins asynchronous calls together similar to how pthread_join works for threads.

The readme shows some good examples of using it freestyle or using the future submodule using the Promise pattern. Example from the docs:

var Join = require('join')
  , join = Join()
  , callbackA = join.add()
  , callbackB = join.add()
  , callbackC = join.add();

function abcComplete(aArgs, bArgs, cArgs) {
  console.log(aArgs[1] + bArgs[1] + cArgs[1]);
}

setTimeout(function () {
  callbackA(null, 'Hello');
}, 300);

setTimeout(function () {
  callbackB(null, 'World');
}, 500);

setTimeout(function () {
  callbackC(null, '!');
}, 400);

// this must be called after all 
join.when(abcComplete);

Solution 4

A simple solution might be possible here: http://howtonode.org/control-flow-part-ii scroll to Parallel actions. Another way would be to have A,B, and C all share the same callback function, have that function have an global or at least out-of-the-function incrementor, if all three have called the callback then let it run D, ofcourse you will have to store the results of A,B, and C somewhere as well.

Solution 5

Another option could be the Step module for Node: https://github.com/creationix/step

Share:
37,085

Related videos on Youtube

hansvb
Author by

hansvb

Updated on July 05, 2022

Comments

  • hansvb
    hansvb almost 2 years

    The event-driven programming model of node.js makes it somewhat tricky to coordinate the program flow.

    Simple sequential execution gets turned into nested callbacks, which is easy enough (though a bit convoluted to write down).

    But how about parallel execution? Say you have three tasks A,B,C that can run in parallel and when they are done, you want to send their results to task D.

    With a fork/join model this would be

    • fork A
    • fork B
    • fork C
    • join A,B,C, run D

    How do I write that in node.js ? Are there any best practices or cookbooks? Do I have to hand-roll a solution every time, or is there some library with helpers for this?

  • hansvb
    hansvb over 13 years
    "Nothing is truly parallel in node.js since it is single threaded." Not true. Everything that does not use the CPU (such as waiting for network I/O) runs in parallel.
  • MooGoo
    MooGoo over 13 years
    It is true, for the most part. Waiting for IO in Node doesn't block other code from running, but when the code is run, it is one at a time. The only true parallel execution in Node is from spawning child processes, but then that could be said of nearly any environment.
  • slebetman
    slebetman over 13 years
    @Thilo: Usually we call code that does not use the CPU as not running. If you are not running you can't be "running" in parallel.
  • slebetman
    slebetman over 13 years
    @MooGoo: Spawning a child process or thread does have the potential of being run in true parallel though: on multi-core systems. Multi-core systems these days are the norm. In fact, even smartphones coming out later this year are starting to be multi-core. That's the difference between threads/processes and events. Threads may or may not run in parallel but events definitely don't run in parallel.
  • slebetman
    slebetman over 13 years
    @MooGoo: The implication of this is that with events, because we know they definitely cannot run in parallel, we don't have to worry about semaphores and mutexes while with threads we have to lock shared resources.
  • slebetman
    slebetman over 13 years
    @MooGoo @Thilo: Actually I take back that definition of "truly parallel". The real difference between running parallel threads and scheduling events is that with threads your code may be interrupted at any time (by I/O completion for example) while with events nothing may interrupt your code. You can illustrate that node is not running code in parallel by writing an infinite loop: while(1){} and see that all the other tasks won't be run until the loop completes, which is never. Threads behave differently.
  • hansvb
    hansvb over 13 years
    "Usually we call code that does not use the CPU as not running" Well, it could be using CPU on other systems. When I call a web service, real work gets done while I wait. With node.js I can call ten of those web services in parallel, which improves my throughput tenfold, even though only one thread runs on my local CPU. Of course, this reasoning only works for tasks that are not CPU-bound, but for those, node.js is probably not the right tool anyway. For those, you want pre-emptive multitasking and multi-core support, either using multiple threads or multiple processes.
  • slebetman
    slebetman over 13 years
    @Thilo: I already covered that in my original answer. You quoted Nothing is truly parallel in node.js since it is single threaded.. but did not quote And some things like database access are actually "parallel".. which says exactly what you're saying.
  • hansvb
    hansvb over 13 years
    +1 I like that improved fork function. With a way to make the order in all_results the same as the order of async_calls (right now it is the undeterministic completion order), that looks like an accepted answer ;-)
  • hansvb
    hansvb over 13 years
    As for the semantics of Nothing is truly parallel and some things are parallel, maybe we can agree on Everything is parallel in node.js except CPU use.
  • slebetman
    slebetman over 13 years
    Modified my answer to make the order of all_results the same as the order of async_calls.
  • MooGoo
    MooGoo over 13 years
    I was really just talking about parallelism at the OS level, whether or not it actually takes place in hardware. Nodejs has never been about code running in parallel, but about I/O code that does not block other code from executing while waiting for server/file system response. I often will spawn a child Node process to run a task (like database queries) that would normally block code execution. This relies on OS level preemptive multitasking to divvy up CPU time and not let one process take more than its fair share, and happens regardless of your CPU's capability for true parallel processing.
  • TK-421
    TK-421 over 13 years
    @slebetman: Could you edit your first example to show the complete usage?
  • slebetman
    slebetman over 13 years
    @luke-in-stormtrooper-armor: What kind of example do you want? In browser environment? In Node?
  • slebetman
    slebetman over 13 years
    @tk421: BTW, the usage in the first example is complete assuming you've defined functions A,B,C and D.
  • slebetman
    slebetman over 13 years
    @tk421: Ok, added a more complete Node example at the bottom. Kept in the style of A,B,C,D to blend with the tone of the rest of the answer. You can of course write it completely with anonymous functions.
  • Aaron Rustad
    Aaron Rustad over 13 years
    Am I correct in saying that these are not functions executing in parallel, but they are (at best) executing in an undetermined sequence with code not progressing until each 'async_func' returns?
  • slebetman
    slebetman over 13 years
    @BigCanOfTuna: I would say yes. That's the disagreement I have with the OP. My definition of "parallel" (and I think yours as well) excludes what we're doing here. BTW, the fork/join control structure has its origin in scatter/gather instructions on Cray hardware and is more popularly implemented as map->reduce these days. Personally I prefer to call the function scatter() or sync() because fork has a different meaning at the OS level.
  • bwindels
    bwindels about 11 years
    This is incorrect, async only helps you organize your code flow within a single process.
  • Evan Leis
    Evan Leis almost 11 years
    It doesn't look like step does real parallelism.
  • Dave Stibrany
    Dave Stibrany over 10 years
    async.parallel does indeed do roughly the same thing as the above fork function
  • Steve Jansen
    Steve Jansen over 10 years
    I think slebetman and @Thilo are really debating between concurrent processing vs. parallel processing. See stackoverflow.com/a/1898024/1995977 for an awesome diagram of the difference.
  • user1767586
    user1767586 about 9 years
    "// Read 3 files in parallel and process them together:" How can it be in parallell if it's single-threaded? I don't see you spawning any new threads in your fork function.
  • slebetman
    slebetman about 9 years
    @user1767586: Yes, they're single threaded. If you want to know in detail how it works check out the select() function in C.
  • slebetman
    slebetman about 9 years
    @user1767586: Basically they're parallel because the actual processing is typically carried out on other machines, often in another country. While waiting for those machines to reply you can run other code. You can also wait in parallel - that is, initiate lots of requests and wait on all of them together. Of course, I'm exaggerating when I say the processing happens in another country but not by much. The processing can also happen on the same machine but in a different program (mysql server for example)
  • rab
    rab over 8 years
    it's not a true parallelism