We love AWS Lambda, but its concurrency handling with SQS is silly.

June 3, 2021 - How ToSystems & Infrastructure

We’re big fans of serverless here at Foxy.io, both in general, and the Serverless.com framework specifically. We use serverless apps internally for a number of things, like WAF automation (to block card testers), our webhooks system, our new Help Scout + Foxy ecommerce integration, and the custom shipping code functionality within Foxy.

We’ve also got some serverless starter packages for Netlify and Cloudflare (for a Cloudflare-based membership protected site, for instance).

Note: What follows gets a little technical, and is highly specific to Lambda (AWS’s serverless computing platform) and SQS (AWS’s queuing system). We’ll try to keep it approachable, but it’s gonna be a ride.

Queues and batch processing in serverless apps

One thing serverless apps can be great at is batch processing, where you might have thousands of events to process, but you only need to do this daily or weekly. Instead of maintaining servers sitting idly when they’re not needed, a serverless app can scale from zero to the moon, then scale back down to zero. This saves money, and effectively eliminates setting up autoscaling.

Because serverless functions are generally intended to be short-running, you’ll want to make your serverless function process just one event at a time. In the olden days, if you had 5,000 events to process every day, you might have a single long-running process that pulls all records from the database in a single query, then loops through the results and processes each in one big process. But with a serverless approach, you’d have one method that retrieves all your data from the database and stuffs each record into a queue; then your serverless app would pull items from the queue and process them individually.

AWS Lambda can be configured to pull from an SQS queue, so it’s a beautiful and nearly bulletproof way to process lots of data.

Scaling! 🚀🔜🌖

If your Lambdas are self-contained and can scale to the moon, great! By default, Lambda will pull up to 5 batches of events from the queue at the same time. (A “batch” in this context is the number of items Lambda will pull at one time. This is configurable, but for our purposes let’s just assume our batch size is configured to 1, so Lambda will just pull 1 item from the queue at a time.) From the AWS docs about Lambda + SQS:

If messages are still available, Lambda increases the number of processes that are reading batches by up to 60 more instances per minute. The maximum number of batches that can be processed simultaneously by an event source mapping is 1000.

Great! Lambda will scale up to 1000 concurrent executions!

But…

Maybe not so great. What if your code makes a request to a 3rd party API that implements rate limiting? Lambda might scale to run your code 10,000 times in 5 minutes, but the 3rd party might automatically block or reject your API requests. What to do?

Concurrency limits to the rescue? No 🙁

Thankfully, Lambda offers a concurrency setting. Set it to 2, for instance, and your Lambda will only execute 2 at the same time. If your function takes 1s (let’s say there are numerous external http requests it’s making), that’ll slow things down so it only executes about 2 times a second.

So this should be easy! Set your concurrency lower, and you’ll naturally slow things down, and avoid hammering 3rd party services.

But… though setting the concurrency is easy, and it does indeed prevent your Lambda from executing more than you want, the Lambda concurrency setting doesn’t actually impact the SQS -> Lambda connection. In other words, even if your concurrency is 2, Lambda will still start out by pulling 5 batches from SQS at a time, and it will still scale up to 1000.

So your Lambda is only allowed to run 2 at a time, but SQS is trying to push hundreds of events to Lambda, then almost all those events are going to be bounced back to SQS. And this is a problem because (typically) you’ll configure SQS to only attempt each item a few times before it gives up and sends that item to the Dead Letter Queue (DLQ).

If you have 10,000 items in the queue, and are only processing 2 at a time, it’s very likely you’ll have (many) items that never actually make it to the Lambda before they get sent to the DLQ. Decidedly not good.

(If this explanation isn’t clear, a few others have written about this, so their posts might help.)

What’s a programmer to do?

AWS’s documentation recommends a bit of a cludgy workaround, in our opinion. They basically suggest setting the SQS reattempt values such that items don’t go to the DLQ until they’ve been attempted more than they’ll possibly fail to reach Lambda. This … sort of works, but it’s still not ideal.

But there’s a way to make this work a bit more properly, if you’re able to use SQS FIFO (First In First Out) queues. With a normal SQS queue, Lambda might pull items from the queue in any order. But with a FIFO queue, the order that items are added to the queue is the order in which they’ll be sent to Lambda. An item must be processed by Lambda before the next item gets sent.

With this approach, it basically enforces a concurrency of 1, because a second Lambda can’t pull the 2nd item in the queue until the first item’s done. One at a time.

That gets us throttling down to a concurrency of 1, but what if you want something more than 1, but less than 1000? FIFO allows a “group ID” parameter, which is almost like a mini-queue within the main queue. Each group ID’s FIFO ordering is separate, so (in effect) each group ID increases the concurrency by 1.

The group ID can be anything, so you could make it a random integer between 1 and your Lambda’s configured maximum concurrency. Want Lambda to run with a max concurrency of 10? Set it to 10, and when you feed items to the queue, just assign each item a random group ID between 1 and 10.

The key here is that the underlying SQS+Lambda polling system (which, remember, ignores the concurrency configuration) won’t attempt to push items from the queue to Lambda beyond what you have configured.

This is actually documented in an AWS blog post about FIFO queues, but it’s not clear from the documentation (and the blog’s a bit dense, and buries the lede a bit, at least as far as this particular concurrency solution is concerned).

One important thing to note, however, is that if you use group IDs for something else (ie. not a random number but an account ID, for instance), and the # of group IDs is greater than the concurrency limit, you will still potentially throttle SQS messages. Just something to keep in mind.

How we’re using this at Foxy

We ran into this while rebuilding our daily subscription processing system. Because processing a subscription can rely on many 3rd parties (like Avalara for taxes, Square or PayPal for the payment, FedEx for shipping, an anti-fraud integration, a custom pre-payment webhook, etc.), and because our subscription processing actually happens on a “serverful” system that scales more slowly, we can’t just throw a thousand requests per second at every one of those external systems. By setting group IDs on the SQS messages, we can limit Lambda’s concurrency without worrying about messages going to the DLQ prematurely.

To make things a bit tidier, you can set your max concurrency limit and the # of group IDs using a shared environment variable, so you won’t forget to update it in both places, if/when you need to change it.

So there you have it! If you’re using Lambda triggered by SQS, and you need to limit Lambda’s concurrency, you’ll likely want to use a FIFO queue. If you’ve read this far, our hope is that our experience helps you get the Lambda + SQS concurrency you want 🙂