Getting a Written Transcript for a Youtube Video Using IBM Watson

I will go through a quick script that will download audio from a Youtube video and then send it off to the IBM Watson speech to text service to get a written transcript.

If you're the impatient type, here's the complete code.
youtube-transcriber

Set Up

To illustrate, I'm going to create three files. youtube.js to handle downloading the audio file, watson.js to get the transcription from Watson, and index.js to tie them together. There are a handful of modules we'll have to prepare, also.

We're going to use youtube-dl to handle getting the audio for us. If you have homebrew, install youtube-dl by running:

brew install youtube-dl

We will have to do some audio processing as well. youtube-dl can make use of ffmpeg to do all its auditory dirty work, so go ahead and install that, too.

brew install ffmpeg

While we're on the module train, let's prep all the necessary npm packages.

npm install bluebird watson-developer-cloud fluent-ffmpeg

Downloading Audio from Youtube

To download the audio, we will spawn a node child process to run youtube-dl.The spawn process does all the work for us. We pass the URL of the video we want, specify only to grab the audio, set the output to mp3, and set the file name. I have the file named simply as file.mp3, but it can be changed to whatever you need. Check the youtube-dl docs for more details.

// youtube.js
var spawn = require('child_process').spawn;  
var Promise = require('bluebird');  
var ffmpeg = require('fluent-ffmpeg');  
var path = require('path');

exports.getYouTubeAudio = function(videoId){  
    return new Promise(function(resolve, reject){
      // Install youtube_dl locally: brew install youtube-dl
    youtube_dl = spawn('youtube-dl', ['--extract-audio', '--audio-format', 'mp3', '-o', 'file.%(ext)s', "http://www.youtube.com/watch?v=" + videoId]);

    youtube_dl.stdout.on('data', function(data){
      console.log(data.toString());
    });

    youtube_dl.stderr.on('data', function(data){
      process.stderr.write(data);
    });

    // brew install ffmpeg
    youtube_dl.on('exit', function(){
      var mp3File = path.join(__dirname, 'file.mp3');
      var flacFile = path.join(__dirname, 'file.flac')
      ffmpeg(mp3File)
        .output(flacFile)
        .on('end', function(){
          resolve();
        })
        .on('error', function(err){
          reject(err);
        })
        .run();
    });
  });
};

Of course, Watson takes flac files, so we'll have to convert our mp3. We can use the fluent-ffmpeg module to take care of this for us.

    //youtube.js
    var ffmpeg = require('fluent-ffmpeg);
...
    youtube_dl.on('exit', function(){
      var mp3File = path.join(__dirname, 'file.mp3');
      var flacFile = path.join(__dirname, 'file.flac')
      ffmpeg(mp3File)
        .output(flacFile)
        .on('end', function(){
          resolve();
        })
        .on('error', function(err){
          reject(err);
        })
        .run();
    });

I've also used Bluebird as a promise library to resolve once the ffmpeg conversion is complete, so that we can easily chain the download with our next script.

Getting a Transcript from Watson

In order to use Watson, you need to head on over to IBM Bluemix and sign up to get some creds.

To use Watson in Node, the good folks at IBM were kind enough to create an npm module for us. So to get connected, include this code.

// watson.js
var watson = require('watson-developer-cloud');

var speech_to_text = watson.speech_to_text({  
  username: <bluemix username>,
  password: <bluemix password>,
  version: 'v1',
  url: 'https://stream.watsonplatform.net/speech-to-text/api',
});

There are a few ways to send and get the transcript from Watson. I went for streaming as you can send larger files. As the response is streamed back, we're going to build out a json object and then write it to a file when the transcription stream is done.

This function will take in the path to the audio file we want to transcribe, file.flac for our example, and will write the transcript to transcript.json. The params option allows you to specify additional information you want Watson to include. Here, I asked for timestamps for each word as well as to ignore silence by setting continue to true. If you only want the written transcript, you can stream the output directly into a writeStream to a text file. I opted for JSON because I wanted to get at the timestamp objects.

// watson.js
var watson = require('watson-developer-cloud');  
var fs = require('fs');  
var path = require('path');  
var Promise = require('bluebird');

var speech_to_text = watson.speech_to_text({  
  username: <bluemix username>,
  password: <bluemix password>,
  version: 'v1',
  url: 'https://stream.watsonplatform.net/speech-to-text/api',
});

exports.watsonSpeechToText = function(audioFile) {

  return new Promise(function(resolve, reject) {

    var params = {
      content_type: 'audio/flac',
      timestamps: true,
      continuous: true
    };

    var results = [];

    // create the stream
    var recognizeStream = speech_to_text.createRecognizeStream(params);

    // pipe in some audio
    fs.createReadStream(audioFile).pipe(recognizeStream);

    // listen for 'data' events for just the final text
    // listen for 'results' events to get the raw JSON with interim results, timings, etc.

    recognizeStream.setEncoding('utf8'); // to get strings instead of Buffers from `data` events

    recognizeStream.on('results', function(e) {
      if (e.results[0].final) {
        results.push(e);
      }
    });

    ['data', 'results', 'error', 'connection-close'].forEach(function(eventName) {
      recognizeStream.on(eventName, console.log.bind(console, eventName + ' event: '));
    });

    recognizeStream.on('error', function(err) {
      util.handleError('Error writing to transcript.json: ' + err);
    });

    recognizeStream.on('connection-close', function() {
        var transcriptFile = path.join(__dirname, 'transcript.json');

      fs.writeFile(transcriptFile, JSON.stringify(results), function(err) {
        if (err) {
          util.handleError(err);
        }
        resolve();
      });
    });
  });
};

All you need to do is chain these functions together and you will have all the transcripted goodness you would want!

Running the Show

index.js will combine our two modules and allow you to run them using the command line. As it's set up here, run node index.js transcribe <some video id> and you're up and running!

Give this one a try: node index.js transcribe I9VA-U69yaY

// index.js
var watson = require('./watson');  
var youtube = require('./youtube');  
var path = require('path');

var flags = process.argv.slice(2);

if(flags[0] === 'transcribe'){  
    youtube.getYouTubeAudio(flags[1])
        .then(watson.watsonSpeechToText.bind(this, path.join(__dirname, 'file.flac')))
        .then(function(){
            console.log('Done transcribing video id: ' + flags[1]);
        });
}