In our design lab, each request to add a new piece of clipart or process a user upload touches our clipart service. Historically, this application has been hosted on EC2 which has led to some problems for us. A combination of inconsistent load and random hangups on certain files has led to it becoming an operational and financial burden. We chose to undertake the work to migrate the code path which was the worst offender to AWS Lambda. While not painless, this proved to address the two pain points we had with the current setup.
The main issue we had with the old service was load. The load on the application is not consistent, even with autoscaling we end up over-provisioned during off peak hours and under-provisioned at critical times. This has cost us monetarily, with us paying for more servers than we need, or caused issues for customers, both internal and external, when working with clipart inside of the Custom Ink design lab without enough servers to handle all of their requests. The application also suffers from random hangups from poorly formatted clipart files or extremely large user uploads. A request for a file in either of these categories would cause an application server to fail to respond to health checks, leave the load balance, then require manual intervention to become healthy again.
Lambda proved to be an excellent way to combat this issue. Amazon's provided Node.js environment already shipped ImageMagick, a critical part of our existing service; they guaranteed a stronger level of isolation between requests than we can realistically provide; and, like all AWS supported platform, had a well documented client library for accessing all of the stored data we need to complete requests. As long as we could generate files which were similar to the current service, it would reduce the burden on operations, keep a bad image from hampering site performance, and help us control our costs by only paying for the processing time we need.
The process we needed to replicate performed a few integrated tasks. It was responsible for:
Accomplishing file retrieval from S3 was simple, we just called out to the Node.js SDK to do this. Resizing and scaling are handled by ImageMagick, and color removal and conversion were implemented by hand. Tracking user selection was moved to a SQS queue. The lambda just publishes the requests to the queue, and another application collects the data for reporting purposes.
Porting to Lambda was not difficult, the first version was completed in about two days of work, but the process wasn't without issues. Debugging took much longer and we ran into a few hiccups along the way.
The issues we ran into fell into three groups, API Gateway issues, ImageMagick issues and environment reuse issues.
API Gateway is an Amazon service which, among other things, allows for you to invoke a Lambda function over HTTP. It is first and foremost designed for use with XML or JSON. This becomes a problem when trying to serve binary since the api gateway attempts to set the headers for xml, text, or json over the binary content.
To get it to serve an arbitrary binary back to a web browser requires you to do two things. First, you need to set isBase64Encoded
to true in the returned json of the handler. Second, you need to update your binary media types to include */*
then redeploy the stage.
The second major issue we ran into was ImageMagick on Lambda itself. Long story short, we were unable to disable antialiasing when resizing. That issue required us to render each clipart at their preferred eps size, pull the image into memory, use the CIEDE 2000 formula to identify the colors we need to remove, then apply our optional one color conversation pass. When looking at using ImageMagick on Lambda I would strongly suggest testing anything related to color heavily before investing.
Finally we had some issues with environment reuse. The temporary space that AWS provides is kept when lambda containers are reused. This means that if / when your lambda function reaches the point where AWS starts to reuse environments, you can fill up the disk and cause your function to error out. The easiest way to deal with that is to delete any files your lambda function created before finishing execution.
At CustomInk we use Rollbar for all of our error reporting. When using Rollbar with Lambda, take advantage of their out of the box integration for lambda.
I would also strongly suggest setting up X-Ray. This service proved to be invaluable during development allowing us to see slow paths in our code. If you use promises to structure your code i've found the following snippet to be helpful:
var AWSXRay = require('aws-xray-sdk-core')
const TRACE_ID = process.env._X_AMZN_TRACE_ID
function xrayPromise(segment, promise) {
if(TRACE_ID == undefined) {
return promise
}
return new Promise(function(resolve, reject) {
AWSXRay.captureAsyncFunc(segment, function(subsegment) {
promise.then(function(output) {
subsegment.close()
resolve(output)
}).catch(function(err) {
subsegment.close(err)
reject(err)
})
})
})
}
module.exports = xrayPromise
Moving over to a Lambda based process allowed us to cut down on the number and size of the servers needed for our clipart service. Overall we feel that Lambda was a good fit for the project. It allowed us to realize the goal of a better scaling clipart process and gain significant cost savings.