Creating video thumbnails with AWS Lambda in your s3 Bucket

Derek Cameron
Derek Cameron

When developing a site that requires thumbnails for videos developers may choose to build applications in php, python, nodejs, etc and try to generate thumbnails from the production server upon upload. Though another method that is becoming more prevalant is using serverless features such as AWS Lambda. So I'm going to show you how to create a Lambda function in NodeJS that will generate thumbnails for files that are uploaded onto your AWS S3 bucket.

First your are going to need create a directory to store your NodeJS package. Let's begin by creating a new directory called videothumbnailer. Next we need to get the binaries for FFMPEG and MediaInfo, since Lambda runs on Amazon Linux it is best to build the binaries on an EC2 instance running amazon linux. For FFMPEG you can get the nightly builds from here or a direct download from here. After you have downloaded FFMPEG rename it ffmpeg and place it in your 'videothumbnailer' folder.

For MediaInfo you will need to build this yourself as you need the curl fuctionality. To do this simply start an Amazon EC2 instance running Amazon Linux then run the following commands after connecting via ssh.

#Install Dev Tools
sudo yum groupinstall 'Development Tools'
#Install libcurl for mediainfo
sudo yum install libcurl-devel
#Get version 0.7.84 of mediainfo (version 0.7.84)
wget http://mediaarea.net/download/binary/mediainfo/0.7.84/MediaInfo_CLI_0.7.84_GNU_FromSource.tar.xz
#Untar the mediainfo
tar xvf MediaInfo_CLI_0.7.84_GNU_FromSource.tar.xz
#Change the directory
cd MediaInfo_CLI_GNU_FromSource
#Complie mediainfo
./CLI_Compile.sh --with-libcurl

Then download the compiled file via your preferred method and place it in your 'videothumbnailer' folder. Next we need to create a package.json. The file is normally located in the following directory on ec2-instances:

/home/ec2-user/MediaInfo_CLI_GNU_FromSource/MediaInfo/Project/GNU/CLI

Create a package.json file with the following information.

{
  "name": "video-thumbnailer",
  "version": "0.0.9",
  "description": "",
  "private":true,
  "main": "index.js",
  "scripts": {},
  "author": "YourNameHere",
  "license": "Unlicensed",
  "dependencies": {
    "async": "^2.5.0",
    "mediainfo-wrapper": "1.2.0"
  }
}

Creating the function

Now your ready to write the actual function. So it's time to create the handler file named index.js. 

First up you want to set your environment, so that you can run ffmpeg. 

Setting up environment and ffmpeg permissions (updated)

process.env.PATH = process.env.PATH + ":/var/task";
process.env["FFMPEG_PATH"] = process.env["LAMBDA_TASK_ROOT"] + "/ffmpeg";

The first two lines setup the PATH variables so that ffmpeg can run correctly. Due to AWS removing the ability to set chmod, the best method is to run chmod 755 on the videothumbnailer project directory. Alternatively you can set chmod 755 for both the FFMmpeg binary and mediainfo binary. Next Step is setting up our variables and our required packages before the event handler.

Setting up variables and includes

var child_process = require("child_process"),
    async = require("async"),
    AWS = require("aws-sdk"),
    fs = require("fs"),
    utils = {
        decodeKey: function(key) {
            return decodeURIComponent(key).replace(/\+/g, " ");
            }
    },
    mediainfo = require('mediainfo-wrapper');
var s3 = new AWS.S3();
var thumbKeyPrefix = "thumbnails/",
  thumbWidth = 180,
  thumbHeight = 180,
  allowedFileTypes = ["mov", "mpg", "mpeg", "mp4", "wmv","avi"];
  

We require AWS-SDK for s3 upload/get and signed urls, fs is for creating our temporary file for screenshot, async is to allow us to work asynchronously on AWS Lambda and medainfo-wrapper is for getting information about the file. The utils object allows to place any extra functions we may need, while decode key is required for decoding the json object that is received in the event handler part of the code. Lastly these variables are for setting your thumbnail prefix, your thumbnail width, height and which files are allowed.

Beginning the event handler

exports.handler = function(event, context) {
  var tmpFile = fs.createWriteStream("/tmp/screenshot.jpg");
  var srcKey = utils.decodeKey(event.Records[0].s3.object.key),
  bucket = event.Records[0].s3.bucket.name,
  dstKey = thumbKeyPrefix + srcKey.replace(/\.\w+$/, ".jpg"),
  fileType = srcKey.match(/\.\w+$/),
  target = s3.getSignedUrl("getObject",{Bucket:bucket, Key:srcKey, Expires: 900});  
  var metadata = {Width: 0, Height: 0};
  if(srcKey.indexOf(thumbKeyPrefix) === 0) {
    return;
  }

  if (fileType === null) {
    context.fail("Invalid filetype found for key: " + srcKey);
    return;
  }

  fileType = fileType[0].substr(1);
  
  if (allowedFileTypes.indexOf(fileType) === -1) {
    context.fail("Filetype " + fileType + " not valid for thumbnail, exiting");
    return;
  }

After we enter the exports.handler we create a writestream for the screenshot, then we get the srcKey and bucket from the event record. After that we set up the destination key (dstKey), get our fileType and setup metadata object. The target variable is a s3 signed url that is for the media info and ffmpeg. After we setup these variables we then check if the fileType is allowed and if it isn't we pull a context.fail and exit the function. Otherwise we continue onto the async.waterfall!

Our first waterfall function

async.waterfall([

  function createMetaData(next) {
    mediainfo(target).then(function(data) {
      for (var i in data) {

        for (var x in data[i].video) {
        metadata.Width = data[i].video[0].width[0]; //Width in pixels
        metadata.Height = data[i].video[0].height[0]; //Height in pixels
        break;
        }
      };
      next(null);
    }).catch(function (e){console.error(e)});
    },

In this function we get the information from mediainfo, for this function we only require the Width and Height but if needed you can get the full output and export it to an xml file if you'd like. First we loop through the data (there can be more than 1 video source) and then we get the first video tracks width and height then break from the loop and we jump into our second waterfall function

Creating the thumbnail

 ​​​function createThumbnail(next) {
  var scalingFactor = Math.min(thumbWidth / metadata.Width, thumbHeight / metadata.Height),
  width = Math.round(scalingFactor * metadata.Width),
  height = Math.round(scalingFactor * metadata.Height);

  if (isNaN(width)) width = thumbWidth;
  if (isNaN(height)) height = thumbHeight;
  var ffmpeg = child_process.spawn("ffmpeg", [
    "-ss","00:00:05", // time to take screenshot
    "-i", target, // url to stream from
    "-vf", "thumbnail,scale="+width+":"+height, 
    "-qscale:v" ,"2",
    "-frames:v", "1",
    "-f", "image2",
    "-c:v", "mjpeg",
    "pipe:1"
  ]);
  ffmpeg.on("error", function(err) {
    console.log(err);
  })
  ffmpeg.on("close", function(code) {
    if (code != 0 ) {
      console.log("child process exited with code " + code);
    } else {
      console.log("Processing finished !");
    }
    tmpFile.end(); 
    next(code);
  });
  tmpFile.on("error", function(err) {
    console.log("stream err: ", err);
  });
  ffmpeg.on("end", function() {
    tmpFile.end();  
  })
  ffmpeg.stdout.pipe(tmpFile)
  .on("error", function(err){
    console.log("error while writing: ",err);
  });
},

Here we start by generating a scaling factor to get the right height and width of the new thumbnail based on our thumbWidth/thumbHeight settings. Then we use childprocess.spawn to call ffmpeg with our options, feel free to change these options to suit your needs. Then we setup error logging and pipe into our writestream. When ffmpeg closes we then end the writeStream and go onto our next function which is uploading our thumbnail onto our s3 bucket.

Uploading the thumbnail

  function uploadThumbnail(next) {
    var tmpFile =  fs.createReadStream("/tmp/screenshot.jpg");
    child_process.exec("echo `ls -l -R /tmp`",
      function (error, stdout, stderr) {
        console.log("stdout: " + stdout) // for checking on the screenshot
      });
    var params ={
        Bucket: bucket,
        Key: dstKey,
        Body: tmpFile,
        ContentType: "image/jpg",
        ACL: "public-read",
        Metadata: {
          thumbnail: "TRUE"
        }
      };
    
    var uploadMe = s3.upload(params);
    uploadMe.send(
      function(err, data) {
      if (err != null) console.log("error: " +err);
      next(err);
      }
    );
  }
],

Now we create a readstream for the temporary file and upload it via s3.upload (this function can handle read streams). This is then followed by closing out the async with a error check and our final function

Closing the functions/error handling

    function(err) {
      if (err) {
        console.error(
          "Unable to generate thumbnail for '" + bucket + "/" + srcKey + "'" +
          " due to error: " + err
          );
          context.fail(err);
      } else {
        context.succeed("Created thumbnail for '" + bucket + "/" + srcKey + "'");
      }
    }
  );
};

this is the callback function for our async and when we get to the end it will display any errors and context.fail or succeed. It is important that you include context.succeed or fail as the lambda function will continue until timeout.

Build the package

Now that we have saved our index.js ( it should look like this ) it is time to build our package. So we run npm install in the 'video-thumbnailer' folder. After you run npm-install you will need to remove the extra-bloat included with mediainfo-wrapper. Go into the 'node-modules' folder and then into the 'mediainfo-wrapper' folder. After that go into the 'lib' folder and delete all folders except 'linux64'. Now move the mediainfo binary from 'videothumbnailer' to 'linux64' folder overwriting the mediainfo binary in the 'linux' folder.

Zip up the contents and Deploy

After doing all of that the function is ready for lambda, all you have to do is zip up the contents of 'videothumbnailer' folder. (this should be index.js/ffmpeg/node_modules/package.json) Then just go onto AWS Lambda create a new function and give it a trigger of put objects with a suffix filter of *.avi, *.mov etc.. Although you don't need this it just reduces the number of invocations. Apply a policy of LambdaExecute to the IAM role (this gives read and write access to s3) then set the memory limit to the maximum (NOTE: I have never reached the maximum with this function, its normally around 700mb even for 4gb mov files) and set timeout for 30seconds (again it normally takes around 6 seconds but its good to be safe).
Now you are ready to generate thumbnails from s3 uploads.


Index.js

process.env.PATH = process.env.PATH + ":/var/task";
process.env["FFMPEG_PATH"] = process.env["LAMBDA_TASK_ROOT"] + "/ffmpeg";
    var child_process = require("child_process"),
        async = require("async"),
        AWS = require("aws-sdk"),
        fs = require("fs"),
        utils = {
            decodeKey: function(key) {
                return decodeURIComponent(key).replace(/\+/g, " ");
                }
        },
        mediainfo = require('mediainfo-wrapper');
var s3 = new AWS.S3();
var thumbKeyPrefix = "thumbnails/",
  thumbWidth = 180,
  thumbHeight = 180,
  allowedFileTypes = ["mov", "mpg", "mpeg", "mp4", "wmv","avi"];


exports.handler = function(event, context) {
  var tmpFile = fs.createWriteStream("/tmp/screenshot.jpg");
  var srcKey = utils.decodeKey(event.Records[0].s3.object.key),
  bucket = event.Records[0].s3.bucket.name,
  dstKey = thumbKeyPrefix + srcKey.replace(/\.\w+$/, ".jpg"),
  fileType = srcKey.match(/\.\w+$/),
  target = s3.getSignedUrl("getObject",{Bucket:bucket, Key:srcKey, Expires: 900});  
  var metadata = {Width: 0, Height: 0};
  if(srcKey.indexOf(thumbKeyPrefix) === 0) {
    return;
  }

  if (fileType === null) {
    context.fail("Invalid filetype found for key: " + srcKey);
    return;
  }

  fileType = fileType[0].substr(1);
  
  if (allowedFileTypes.indexOf(fileType) === -1) {
    context.fail("Filetype " + fileType + " not valid for thumbnail, exiting");
    return;
  }
  async.waterfall([

    function createMetaData(next) {
      mediainfo(target).then(function(data) {
        for (var i in data) {

          for (var x in data[i].video) {
          metadata.Width = data[i].video[0].width[0]; //Width in pixels
          metadata.Height = data[i].video[0].height[0]; //Height in pixels
          break;
          }
        };
        next(null);
      }).catch(function (e){console.error(e)});
      },

      function createThumbnail(next) {
        var scalingFactor = Math.min(thumbWidth / metadata.Width, thumbHeight / metadata.Height),
        width = Math.round(scalingFactor * metadata.Width),
        height = Math.round(scalingFactor * metadata.Height);

        if (isNaN(width)) width = thumbWidth;
        if (isNaN(height)) height = thumbHeight;
        var ffmpeg = child_process.spawn("ffmpeg", [
          "-ss","00:00:05", // time to take screenshot
          "-i", target, // url to stream from
          "-vf", "thumbnail,scale="+width+":"+height, 
          "-qscale:v" ,"2",
          "-frames:v", "1",
          "-f", "image2",
          "-c:v", "mjpeg",
          "pipe:1"
        ]);
        ffmpeg.on("error", function(err) {
          console.log(err);
        })
        ffmpeg.on("close", function(code) {
          if (code != 0 ) {
            console.log("child process exited with code " + code);
          } else {
            console.log("Processing finished !");
          }
          tmpFile.end(); 
          next(code);
        });
        tmpFile.on("error", function(err) {
          console.log("stream err: ", err);
        });
        ffmpeg.on("end", function() {
          tmpFile.end();  
        })
        ffmpeg.stdout.pipe(tmpFile)
        .on("error", function(err){
          console.log("error while writing: ",err);
        });
      },
  
      function uploadThumbnail(next) {
        var tmpFile =  fs.createReadStream("/tmp/screenshot.jpg");
        child_process.exec("echo `ls -l -R /tmp`",
          function (error, stdout, stderr) {
            console.log("stdout: " + stdout) // for checking on the screenshot
          });
        var params ={
            Bucket: bucket,
            Key: dstKey,
            Body: tmpFile,
            ContentType: "image/jpg",
            ACL: "public-read",
            Metadata: {
              thumbnail: "TRUE"
            }
          };
        
        var uploadMe = s3.upload(params);
        uploadMe.send(
          function(err, data) {
          if (err != null) console.log("error: " +err);
          next(err);
          }
        );
      }
    ],
    function(err) {
      if (err) {
        console.error(
          "Unable to generate thumbnail for '" + bucket + "/" + srcKey + "'" +
          " due to error: " + err
          );
          context.fail(err);
      } else {
        context.succeed("Created thumbnail for '" + bucket + "/" + srcKey + "'");
      }
    }
  );
};