AWS Step Functions, State Machines, Bifrost, and Building Deployers


AWS Step Functions are hosted state-machines defined according to the Amazon States Language. To execute a Step function you send it JSON data which is given to an initial state to process then pass the output to another state. States are processed until a success or failure state is reached.

How a state processes its input and selects the next state depends on its Type. For example, a Task state can use a Lambda function to process the input, and a Choice state can select which state to go to next based on its input.

Step functions are awesome because they:

  1. Explicitly define the order of execution, including all conditional paths, in a simple to understand model.
  2. Perform common tasks, like calling Lambda functions, removing a ton of boilerplate code.
  3. Handle errors and retrying in response to failure increasing reliability without sacrificing understandability.

Here is a small example where a state-machine calls out to a Lambda function and makes a choice based on its output:

{
"StartAt": "CallLambda",
"States": {
"CallLambda": {
"Type": "Task",
"Resource": "<lambda_arn>",
"Next": "Worked?",
"Retry": [{ "ErrorEquals": ["KnownError"] }],
"Catch": [{
"ErrorEquals": ["States.ALL"],
"Next": "Failure"
}]
},
"Worked?": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.Worked",
"BooleanEquals": true,
"Next": "Success"
}
],
"Default": "Failure"
},
"Success": {
"Type": "Succeed"
},
"Failure": {
"Type": "Fail”
}
}
}

This state-machine looks like (generated with step dot --states <state_machine>):

StartAt defines the initial state CallLambda that executes the lambda at <lambda_arn>. The lambda’s output is then sent to Worked?, which goes to Success if its $.Worked attribute is true, otherwise it goes to Failure. If CallLambda returns a KnownError, it will Retry. For other errors it will go to Failure asStates.ALL is a catch-all for any error.

Lambda code and Step functions are separated from one another in AWS and can be developed independently. This can make them difficult to test and validate, as a change in one can cause a bug in the other. To make it easier to develop and test Step functions and Lambda we built the Step framework.

Here is an example of a state-machine using the Step framework:

func StateMachine() (*machine.StateMachine) {
state_machine, _ := machine.FromJSON([]byte(`{
"StartAt": "CallLambda",
"States": {
"CallLambda": {
"Type": "TaskFn",
"Next": "Worked?",
"Retry": [{ "ErrorEquals": ["KnownError"] }],
"Catch": [{
"ErrorEquals": ["States.ALL"],
"Next": "Failure"
}]
},
"Worked?": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.Worked",
"BooleanEquals": true,
"Next": "Success"
}
],
"Default": "Failure"
},
"Success": {
"Type": "Succeed"
},
"Failure": {
"Type": "Fail”
}
}}`))

state_machine.SetResourceFunction("CallLambda", LambdaHandler)

return state_machine
}

The type TaskFn is an extension of the spec to tell the Lambda which Task is calling it so it can route to the correct handler.

LambdaHandler is the function that is called when the Task state CallLambda is reached:

type Input struct {}

type Result struct {
Worked bool
}

func LambdaHandler(_ context.Context, _ *Input) (Result, error) {
return Result{true}, nil
}

Handlers contain the logic. The path is controlled by the state-machine. State-machines can change the path based on the handlers output, but a handler cannot decide what state to jump to.

Testing

With Step a state-machine can be executed by calling StateMachine().Execute("{}"). This sends {} as an input into the machine and returns:

  1. The final output.
  2. The “path” of the states that were visited.
  3. Errors encountered by the process.

This is used by tests:

func Test_Machine(t *testing.T) {
exec, err := StateMachine().Execute("{}")
assert.NoError(t, err)

assert.Equal(t, `{"Worked": true}`, exec.OutputJSON)
assert.Equal(t, []string{
"CallLambda",
"Worked?",
"Success",
}, exec.Path())
}

Fuzz tests are also very useful to help build reliable state-machines. The gofuzz library will randomly generate input to make sure no unhandled errors are returned:

func Test_With_Fuzz(t *testing.T) {
for i := 0; i < 50; i++ {
var input Input
fuzz.New().Fuzz(&input)

_, err := StateMachine().Execute(input)
if err != nil {
assert.NotRegexp(t, "Panic", err.Error())
}
// Other assertions like final states
}
}

Deploy

The ultimate goal is to deploy the Step function and Lambda to AWS. For this we need an executable binary, let’s call it hello. hello executed without any arguments must start a Lambda with run.Lambda(StateMachine()). hello json should print the state-machine with run.JSON(StateMachine()).

The step binary can bootstrap (directly upload) hello to AWS. To install step:

go get github.com/coinbase/step
cd $GOPATH/src/github.com/coinbase/step
go build && go install

Then build and bootstrap hello:

# Build your code for the Lambdas linux environment
GOOS=linux go build -o lambda
zip lambda.zip lambda

# export AWS creds using https://github.com/coinbase/assume-role
assume-role account user

# Use step to upload your code and state-machine to AWS
step bootstrap
-lambda "hello-lambda"
-step "hello-step-function"
-states "$(hello json)"

Step does not create the Lambda/IAM/Step function resources, these must be created first with a tool like terraform or geoengineer.

Practices

Here are a few good practices to follow using Step:

  1. Handle All Errors: Every TaskFn should have a catch for States.ALL errors. This will ensure the state-machine ends in a proper state.
  2. Fail Quickly: The faster a state-machine fails the less cleanup is needed. Fail if unknown JSON parameters are sent, if referenced resources don’t exist, or if other pre-conditions are not met.
  3. Fuzz Input: As described above, using the gofuzz can save you a lot of time as it highlights errors caused by invalid input.
  4. Comment: use the Comment attribute on states. The ultimate goal is to be able to fully understand the state-machine without looking at the code.
  5. Design defensively: Step functions should behave predictably, especially when failing. Alert if a Step function execution finishes in an unexpected state.



Source link

Related Posts