Managing Multiple Child Processes in NodeJS

In a recent project, I needed to run some audio processing functions on a serious amount of audio files. As we were running on a small time server, this was a little too taxing for our intro level RAM allotments. So I looked into outsourcing the heavy lifting into separate child processes and then limiting the total concurrent number of processes running at any given time.

In this post, I'm going to go through the steps I came up with. First, how to create a child process in a separate file and run it using the spawn method. Next, how to promisify the process and pass arguments to it. Lastly, how to queue up a whole bunch of those processes and limiting the total number that will be run concurrently.

If you're just here for some sweet code, here's the repo with the example code.

Outsourcing a Child Process

The only npm dependency we'll need is bluebird

npm init  
npm install --save bluebird  

Our driver file will be in index.js and we'll put our child process login in process.js.

├── index.js
├── node_modules
├── package.json
└── process.js

We'll use a trivial example for our child process logic. It will run a setTimeout function to simulate a delay. It will send data to the parent process using process.stdout. It will send some trivial data when it is initiated and again after two seconds when it is completed.

// process.js

    process.stdout.write('Process beginning.');

    setTimeout(function(){
        process.stdout.write('Process complete.');
    }, 2000);

Our index.js file will use the spawn function from Node's child_process module to call process.js. Spawn needs to be passed in two parameters to run and external file. The first is the command to run the file, in this case node, and the second is an array of arguments, the first of which, will be the file path of the file we would like to run, in this case './process.js'.

Our child process is sending data through the process.stdout.write() function, but we need to set up listeners in the parent process to get access to that data and simply write it to the console. The data that comes back is in the form of a buffer so we'll have to call buffer's .toString method on it to convert into something comprehensible.

We'll set up listeners for three events on the child, a data event from process.stdout, data in the case of an error from process.stderr and finally, we'll listen for the exit event on process.

// index.js
var spawn = require('child_process').spawn;  
var bbPromise = require('bluebird');

function loadProcess() {

  var process = spawn('node', ['./process.js']);

  process.stdout.on('data', function(data) {
    console.log(data.toString());
  });

  process.stderr.on('data', function(err) {
    reject(err.toString());
  });

  process.on('exit', function() {
    console.log('Done!');
  });

}

loadProcess();  

Runnning:

node index.js  

Will get us:

Process  beginning.  
Process complete.  
Done!  

Running Multiple Instances with Arguments

While our previous example is truly fascinating, let's pass some arguments to our child process and use promises so we can queue up a whole bunch.

First we'll modify our index.js file to pass in arguments to our child processes. The second parameter to spawn is an array that accepts any number of elements and makes them available to the child process via process.argv array. Our loadProcess function will take in a parameter, arg, and will pass it to the child process via the spawn function as the second element in the arguments array.

We'll also wrap the event listeners in a bluebird promise, and resolve the promise when the process is complete.

//index.js
 function loadProcess(arg) {

    return new bbPromise(function(resolve, reject) {
      var process = spawn('node', ['./process.js', arg]);

      process.stdout.on('data', function(data) {
        console.log(data.toString());
      });

      process.stderr.on('data', function(err) {
        reject(err.toString());
      });

      process.on('exit', function() {
        resolve();
      });
    });
  }

We'll make a slight modification to process.js so we can get access to the passed arg value. All arguments passed into spawn will be available in the child process in the process.argv property. This property is an array, and the first two elements will be the 'node' command and the filepath respectively, so our value will be happily residing at process.argv[2].

//process.js
var value = process.argv[2];

    process.stdout.write('Process ' + value + ' beginning.');

    setTimeout(function(){
        process.stdout.write('Process ' + value + ' complete.');
    }, 2000);

Next, to make use of our promises, we'll queue up a whole bunch of the child process using different arguments. Then, using bluebird's map function, we'll execute each process, and execute a function when they have all resolved.

// index.js
...
  var commands = [1, 2, 3, 4, 5].map(function(value) {
    return loadProcess.bind(null, value);
  });

  return bbPromise.map(commands, function(command) {
    return command();
  })
  .then(function() {
    console.log('Child Processes Completed');
  });

We create our commands array to be composed of function references to loadProcess bound to different argument values, in this case the numbers one through five.

Then, bluebird's map function will iterate over each command, which are each promises remember, and execute that command. Then, it will console.log when they have all completed.

Running

node index.js  

Should get you something along the lines of

Process 4 beginning.  
Process 2 beginning.  
Process 5 beginning.  
Process 1 beginning.  
Process 3 beginning.  
Process 4 complete.  
Process 2 complete.  
Process 5 complete.  
Process 1 complete.  
Process 3 complete.  
Child Processes Completed  

Limiting the Number of Concurrent Processes

Up above, it's a bit of the wild west out there. Processes are flying in any order and all running at the same time. What if we needed to run them serially? Or what if we needed to limit the total number going at one time?

Bluebird's map function, simply enough, accepts a concurrency argument which will limit the total number of unresolved promises it will have running at any given time.

//index.js 
...
  return bbPromise.map(commands, function(command) {
    return command();
  }, {
    concurrency: 1
  })
  .then(function() {
    console.log('Child Processes Completed');
  });

With a concurrency of one, running index.js will execute each process in order, and will only start after the previous has resolved. Increasing the concurrency value, will start running that number of processes at the same time. Try it out!