Architecting and Operating Resilient Serverless Solutions on AWS

Original post here

“If AWS Lambda scales automatically by design, what is left for us (architects) to do aside from sitting back and relaxing?” you may ask. That’s true, AWS Lambda does scale automatically, but, you will easily miss your performance and operational targets unless you follow some guidelines. Let’s walk through some patterns and services you may want to consider when building a serverless solution.

Capture

Load Shedding

“You can parallelize a system up to the point where contention becomes the bottleneck” –Amdahl’s universal scalability law.
So as transactions per second (TPS) increases, at some point, your system scratches its limits, latency gets higher, your clients experience timeouts and that does not stop your servers (and downstream servers) from continue working hard for nothing. This waste gets worse when clients retry on errors. Here are a few recommendations:

Cheaply reject excess work
- Implement concurrency limits using AWS Lambda Function configuration.
- Implement API throttling using AWS API Gateway.
Do not waste work
- Implement server-side timeout using AWS Lambda Function configuration.
Do bounded work

Implement AWS Lambda Functions that consume a similar amount of resources per event (e.g., pagination is an example of a method that bound the work). Are you doing orchestration within your AWS Lambda Function? You may want to consider using AWS Step Functions (or the newly introduced AWS Step Function Express Workflows feature), which allows taking the orchestration part out from AWS Lambda Functions. AWS Step Function service is also pre-integrated with several AWS services (e.g., AWS SQS, SNS, Batch, etc.).

Capture

Example – Orchestration Within AWS Lambda Function

Do not take extra work
- By implementing AWS Lambda functions you already get an isolated execution environment with fixed resources per request (aka, unit of work).

Dependency Isolation

Little’s Law: concurrency = arrival rate x latency
In other words, the lower the latency goes – the higher concurrency you can gain (and up to the predefined limits).
So if an API supports different modes requiring different compute resources and/or different processing time, that may lead to slowing down all transactions of all modes.
In a different scenario, a service starts generating a higher volume than expected on your database, which badly degrades its performance. Here is how you can compartmentalize dependencies to isolate their concurrency capacity:

Ensure your API is designed to prevent one dependency from affecting unrelated functionality. You do not want your API to become overloaded when dependencies slow down.
Use throttling and concurrency limits in AWS API Gateway and Lambda Functions to protect your services.
Consider placing AWS API Gateway/Lambda Function in front of resources that do not already isolate their concurrency.
Prefer asynchronous invocations over synchronous ones unless you absolutely can’t. Not only that helps to isolate the concurrency of each of the services involved but also if downstream services fail or worst yet, take too long to respond back, your service remains intact. So, even if another service is crippled within the application, there isn’t a ripple effect throughout all of the services due to synchronous dependencies.
If all you need to know is that the request was successfully processed and that the payload was durably stored, you most probably can put AWS Lambda asynchronous invocation to work. But if, after all, you still need the API response, consider adhering to any of these Asynchronous Patterns.

Capture — **Example – Asynchronous Invocation When API Response Is Not Needed**

Implementation Tips

If the order of messages matters, use the recently announced AWS Lambda Supports Amazon SQS FIFO as an Event Source.
If low latency APIs at low cost are of high priority, consider implementing the recently announced HTTP APIs for AWS API Gateway. Basically. you can now choose between Rest API, WebSockets, and HTTP API.

Capture Example – Throttling, Timeout, and exhausted database connections

AWS Lambda Function execution environments get reused and so to avoid resource starvation and application bottlenecks you must clean up resources that are no longer needed and before the termination of the Lambda Function (e.g., file descriptors, sockets, etc.). Having said that, especially within the ‘handler’ function of AWS Lambda, avoid expensive re-initialization of resources. Usually, it is better to keep initialization code outside the function handler so it can be reused across invocations (e.g., static constructors, global/static variables, database connections, etc.). For use cases where you need to refresh these values periodically, use an in-memory cache with an expiry policy.
Note that AWS has just announced AWS RDS Proxy, which can handle persistent database connections (and authentication!) for you. Re-initialization of a database connection (including the TLS handshake) is an expensive operation that can now be reduced significantly.

To handle concurrency correctly in case of synchronous AWS Lambda Function invocations, it is important to understand how AWS Lambda Function Scaling works. Especially this part from AWS official documentation: “… For an initial burst of traffic, your function’s concurrency can reach an initial level of between 500 and 3000, which varies per Region… After the initial burst, your function’s concurrency can scale by an additional 500 instances each minute. This continues until there are enough instances to serve all requests, or a concurrency limit is reached.“. Note that in re:Invent 2019, AWS announced AWS Lambda Function Provisioned Concurrency, which ensures that at any given time you have a predefined number of Lambda Functions ready to be executed concurrently; Use it to avoid cold starts.

AWS services most often use well-defined API throttling limits, so use these APIs with care, for example, refrain from frequent re-initialization (e.g., avoid retrieving the same secret value from AWS Secret Manager on every invocation). If your AWS Lambda Function publishing custom metrics to AWS CloudWatch, consider using the recently announced AWS CloudWatch Embedded Metric Format where your AWS Lambda Function can push these metrics to AWS CloudWatch by simply logging the metrics.
You should also be aware that several AWS services API support batching e.g., AWS SQS, use it (“write in batches”) to reduce your chances to be throttled. If your AWS Lambda Function is being triggered by AWS SQS, you can configure batch size to optimize the concurrent execution of the function (“read in batches”).
Last but not least, lacking or improper error handling may degrade your application resiliency especially in cases of repeating errors. Embrace the Fail-Fast Principle in your application design, there is no point in continuing to generate a steady load on a failing service, it will just make things worse. Consider implementing retry with backoff (e.g., Circuit breaker design pattern) to cope with dependency’s failures.

Avoiding Queue Backlogs

What happens if your queue filled up such that the pace in which messages are produced is higher than the pace in which messages are consumed? The processing of the whole backlog is slowed down, which depending on our service SLA, may not be acceptable.
Depending on your use case, here is how you can deal with the issue:

The consumer applications of those queues shall be designed to automatically scale
During a spike, work with low priority queue (e.g., for enqueuing aged messages) and high priority queue to handle the spike; If reading messages from a queue is much faster than processing it, then read a message from the high priority queue and if it meets certain age criteria, defer the message processing by enqueuing the message in the low priority queue; Or simply drop aged messages (TTL) if application-wise it makes sense (e.g., IoT device point in time state);
Reduce the number of retries attempts and handle failed events by using either AWS Lambda Function Dead Letter Queues or the recently announced feature AWS Lambda Supports Destinations for Asynchronous Invocations. Also, consider configuring the Maximum Event Age in your AWS Lambda Function to automatically dismiss aged events.
Implement backpressure (throttling) using AWS API Gateway to reject excess load or alternatively (and very similar to the priority queues), implement application logic to route excess load to a ‘surge queue’ and the rest of the traffic to a ‘warm queue’. Each queue is processed separately by a dedicated AWS Lambda Function isolated by its concurrency configuration.
Implement the Shuffle Sharding design pattern, where you introduce a group of queues behind a “smart” routing layer. Each customer gets assigned two or more queues and this assignment is permanent. This is all about limiting the blast radius.

Operating

Your serverless solution works like a charm. Until it does not.
How quickly do you diagnose and mitigate issues? You need to be able to analyze the behavior of your distributed application by profiling your code and by monitoring your transactions, application & infrastructure.

Implement a request tracing solution to identify and troubleshoot the root cause of performance issues and errors. There are several open-source and commercial products that can be used (e.g., Epsagon, Dynatrace, Datadog, Zipkin, Jaeger, etc.).
AWS recommends using AWS X-Ray AWS ServiceLens, which “ties together CloudWatch metrics and logs, as well as traces from AWS X-Ray to give you a complete view of your applications and their dependencies” (providing overall system health in one place by combining application and transactions metrics).
Collect, search and analyze your log data by using AWS ElasticSearch service or alternatively by using AWS CloudWatch Logs & the recently announced AWS CloudWatch Logs Insights, which requires no setup and is quite useful and very fast. Also, AWS CloudWatch Contributor Insights, which was also recently announced, can be leveraged to analyze log data to provide a view of the top contributors e.g., user, service, resource, etc. influencing system performance.
Implement monitoring dashboards (e.g., by using AWS CloudWatch Dashboard) in a consistent manner by adhering to some pattern. For example, use a layered approach starting from the customer “front door” – AWS ALB then to your Lambda Functions (including breakdowns at API level) then to your cache service and finally to your database.
Consider monitoring your end-user experience by using the recently announced AWS CloudWatch Synthetics that continually generates traffic to verify your customers’ experiences.

Asynchronous Patterns

It is pretty straightforward to implement asynchronous API when your client is not expecting a response back. But what if it does expect a response back? In this case, you may want to consider implementing one or more of these patterns.

Capture

APIs In The Front, Async In The Back

Polling

The client submits a job to AWS Step Function via AWS API Gateway and in return, it gets an immediate response back with a request-id. Then, the client uses the request-id to poll for status. Once the job is completed and the result is stored in AWS S3 bucket, the client is ready to fetch the results via AWS API Gateway. Now, if the processing time is relatively short (e.g., less than 15 min) then the business logic orchestrated by your AWS Step Function probably would be implemented via AWS Lambda Function, otherwise, you would prefer your AWS Step Function to trigger AWS Batch job.
What if your throughput is relatively large (e.g., greater than 300 RPS)? In that case, you would prefer deploying AWS SQS & Lambda Function in front of your AWS Step Function or the newly introduced AWS Step Function Express Workflows capability may be a good fit to meet high throughput requirements.
What if the object in AWS S3 bucket is relatively large (e.g. greater than 10MB)? Then you would probably prefer to directly download the object from AWS S3 bucket using a pre-signed URL.
On the upside, it requires minimal changes for clients and it may be used to wrap existing backends.
On the downside, first, you delay the response (polling time minus job completion time), second, see how much excess compute is wasted on both ends, client and server.

Example – Asynchronous Patterns – Polling

WebHooks

To establish trust, your client is being registered and verified. Then, the backing service does all the work asynchronously. Lastly, using AWS SNS, the backing service calls back to the client when the job is completed (consider setting a dead-letter queue to an AWS SNS subscription to capture and handle undeliverable messages). Similar to before, if the processing time is relatively short then the business logic orchestrated by your AWS Step Function probably would be implemented via AWS Lambda Function, otherwise, you may better consider triggering AWS Batch job. If the object in AWS S3 bucket is larger than 256KB, which is AWS SNS payload size limit, you would prefer to directly download the object from AWS S3 bucket using a pre-signed URL.
On the upside, compared to polling, it is less resource intensive for both client and server. Also, AWS SNS is handling all the heavy lifting of the delivery & retries.
On the downside, the client needs to host a web endpoint (highly available or/and supporting the server’s retry policy), and the server must implement a mechanism for establishing trust.

Example – Asynchronous Patterns – Webhooks

WebSockets

The client submits a job to AWS Step Function via AWS API Gateway and in return, it gets an immediate response back with a bunch of details enabling the AWS Step Function and the client to securely communicate via a WebSockets endpoint of AWS API Gateway. Then, the client opens a connection to the WebSockets endpoint via AWS API Gateway and AWS Lambda Function implements the necessary logic to notify the AWS Step Function, which also identifies the client as the one who submitted the job. Once the job is completed and the client connection is approved, AWS Step Function executes a callback step updating the client with the result. What if your throughput is relatively large (e.g., greater than 300 RPS)? Like before, you would rather deploy AWS SQS & Lambda Function in front of your AWS Step Function or the newly introduced AWS Step Function Express Workflows capability if it fits better. What if the resulted object is relatively large? AWS API Gateway WebSockets defines payload size limit to be 128KB while each WebSocket frame size must not exceed 32KB. Once again, this limitation can be mitigated, for example, by directly downloading the object from AWS S3 bucket using a pre-signed URL. On the upside, less waste of compute resources and the bi-directional communication channel you established may be beneficial for a wide range of use cases relevant for you. You can now use this ‘push’ (vs. poll) mechanism not only to notify clients when a job is done but you can also proactively push events to update the client on any server-side state changes the client is subscribed to. If you want to learn more about these kinds of architectures, this is a great post to start with: From Poll to Push: Transform APIs using Amazon API Gateway REST APIs and WebSockets. On the downside, AWS API Gateway WebSockets has a limit of 500 new connections per second. You and the information security team need to be well familiar with the WebSockets protocol. Also, you may have this requirement to ensure portability across supported client devices and browsers.

Example – Asynchronous Patterns – WebSockets

Conclusion

We highlighted a few patterns and services you probably want to consider when building resilient serverless solutions on AWS. AWS and Amazon.com for this matter, both embraced the Eat Your Own Dog Food approach to building platforms. AWS services and quite a few of the techniques we covered are already in use by Amazon.com and AWS. At the re:Invent 2019, AWS announced the Amazon Builders’ Library, “a collection of living articles that take readers under the hood of how Amazon architects, releases, and operates the software underpinning Amazon.com and AWS.” For more information including how internally AWS Lambda utilizes some of these techniques, watch the recording of these breakout sessions “SVS407 Architecting and operating resilient serverless systems at scale” and “SVS335 Serverless at scale: Design patterns and optimizations.”

Load Shedding

Dependency Isolation

Implementation Tips

Avoiding Queue Backlogs

Operating

Asynchronous Patterns

Polling

WebHooks

WebSockets

Conclusion

Yossi Cohen