How Coinbase is Scaling Serverless Applications


Serverless, specifically AWS Lambda, is awesome. It scales from 0 to near infinity, it costs next to nothing, and it integrates with almost everything. The trouble starts when going from one engineer deploying applications into one account, to lots of engineers deploying into many shared accounts. It’s hard to make sure applications follow the same good naming and security practices to stop everyone from stepping on each other’s toes.

Providing a secure and pleasant experience for thousands of developers building and deploying hundreds of serverless applications to dozens of AWS accounts is the goal. To that end we developed and open sourced Fenrir, our AWS SAM deployer. This post is about how we use Fenrir to deploy serverless in a large organization.

What the Framework (SAM, serverless…) Doesn’t Do

Serverless frameworks typically include a CLI that can create/update AWS resources and deploy code. For example, both serverless deploy and sam deploy use AWS Cloud Formation (CF) to release code. These deploy commands are useful when getting started, and can easily be put into a CI/CD pipeline to accelerate application release.

When more engineers start deploying serverless applications it is a good idea to ensure they:

  • Use consistent naming: good naming (and tagging) of resources, like Lambda and API Gateway, will keep accounts clean and make obvious which resources belong to which projects.
  • Follow recommended security practices: e.g. practice “least privilege” by giving Lambdas separate security groups and IAM roles.
  • Create a reliable workflow: cleanly handle failure in a way that shows developers what happened, why it happened, and how to remedy.
  • Record what is deployed: quickly answering what is currently deployed allows engineers to debug and understand the current state of the world.

Our solution was to build a centralized deployer. This deployer provides clear boundaries to developers working in the same AWS account and blocks deployment unless common practices are followed. This removes the cognitive overhead of a lot of details and allow engineers to focus on their application code.

Fenrir Serverless Serverless Deployer

Fenrir

Fenrir is our AWS SAM deployer; at its core is a reimplementation of the sam deploy command as an AWS Step Function, so it’s a serverless serverless (serverless²) deployer. sam deploy is an alias for a python script with two steps aws create-change-set and aws cloudformation execute-change-set.

Fenrir’s state machine replicates these steps with explicit state transitions, retries, and error handling:

The input to this state machine is a SAM template with some additional data like ProjectName, ConfigName and the AWS account to deploy to. The Fenrir state machine then performs the following steps:

  • Validate: fills in defaults then validates the template is correct and all referenced resources are allowed to be used.
  • Lock: creates a lock to make sure that only one deploy per project can go out at a time.
  • CreateChangeSet and wait to Execute: create a change-set for a CF stack. Waits for the change-set to be validated and become available.
  • ExecuteChangeSet and wait for Success: waits for the execution to finish.

This state machine finishes in either a Success state, a FailureClean state where the release was unsuccessful but cleanup was successful, or a FailureDirty state that should never happen and will alert the team.

Fenrir (like our other open source deployer Odin) follows the Bifrost standard for building deployers at Coinbase. Bifrost adds multi-account support, security by default, visibility into deploys, and simple integration into our existing tools.

What Fenrir Doesn’t Do

Fenrir only supports subset of AWS SAM. Limiting the template scope reduces the surface area for possible naming conflicts and security risks.

The supported resources are AWS::Serverless::Function, AWS::Serverless::Api, AWS::Serverless::LayerVersion, AWS::Serverless::SimpleTable. Each of these have limitations, for example the AWS::Serverless::Function resource’s limitations are:

  • FunctionName is generated and cannot be defined.
  • Role and VPCConfig.SecurityGroupIds if defined must refer to resources that have correct tags*.
  • VPCConfig.SubnetIds must have the DeployWithFenrir tag equal to true.

Events supported Types are:

  • Api: It must have RestApiId that is a reference to a local API resource
  • S3: Bucket must have correct tags*
  • Kinesis: Stream must have correct tags*
  • DynamoDB: Stream must have correct tags*
  • SQS: Queue must have correct tags*
  • Schedule
  • CloudWatchEvent

*: correct tags means ProjectName, ConfigName tags are correct.

SNS is not on the list of supported events. As of writing, SNS does not support tags making it difficult to validate a Lambda is allowed to listen to an SNS topic. Finding ways to support such events and resources securely is a future goal of Fenrir.

Hello Fenrir

A simple SAM template that works with Fenrir includes ProjectName and ConfigName, e.g. template.yml would look like:

ProjectName: “coinbase/deploy-test”
ConfigName: “development”
AWSTemplateFormatVersion: “2010–09–09”
Transform: AWS::Serverless-2016–10–31
Resources:
helloAPI:
Type: AWS::Serverless::Api
Properties:
StageName: dev
EndpointConfiguration: REGIONAL
hello:
Type: AWS::Serverless::Function
Properties:
CodeUri: .
Role: lambda-role
Handler: hello.lambda
Runtime: go1.x
Events:
hi:
Type: Api
Properties:
RestApiId: !Ref helloAPI
Path: /hello
Method: GET

The hello lambda code:

package main
import “github.com/aws/aws-lambda-go/lambda”
func main() {
lambda.Start(func(_ interface{}) (interface{}, error) {
return map[string]string{“body”: “Hello”}, nil
})
}

Fenrir uses Docker to build and bundle code sent to AWS. The hello function requires /hello.zip to exist in the built docker container, e.g. the Dockerfile:

FROM golang
WORKDIR /
RUN apt-get update && apt-get install -y zip
COPY . .
RUN go get github.com/aws/aws-lambda-go/lambda
RUN GOOS=linux GOARCH=amd64 go build -o hello.lambda .
RUN zip hello.zip hello.lambda

To package and deploy the template using the Step Function you run fenrir package && fenrir deploy:

  1. package builds the Docker image then extracts the zip files
  2. deploy uploads the zip files and sends the template as input to the Fenrir Step Function

Implementation

Fenrir is implemented primarily using:

  • aws-sdk-go to interact with CloudFormation and other AWS resources
  • step as the framework to build, test and deploy AWS Step Functions (Why Coinbase uses Step Functions)
  • goformation to encode/decode CloudFormation and SAM resources as golang structs and validate them using JSON schema.

goformation uses the AWS CloudFormation Resource Specification and SAM specification to generate code and JSON schema. Fenrir then uses these to encode, decode, modify and validate templates. This code generation makes it very easy for Fenrir to keep up to date with changes in SAM and release features quickly.

Future

It’s hard to build tools that are scalable, secure, and easy to use. Fenrir gives our developers cutting edge tools with clear boundaries on how to use them. This is a huge win, but there is still lots of room for improvement by supporting more SAM resources, events and properties.

SAM/Fenrir can’t deploy static websites to S3 behind CloudFront as CloudFormation does’t support uploading S3 Objects. A future Fenrir feature is to provide a custom CloudFormation resource that can upload files to S3 for static website hosting. This would make Fenrir a full-stack serverless² deployer.

Finally, Fenrir is still in beta and we welcome and contributions or feature requests over on our Github repository.

Good Reads



Source link

Related Posts