Refactoring Python-based AWS Lambda functions for testing

August 6, 2022 #aws #lambda #python #codequality

I’ve been looking at serverless, event-driven architectures using AWS Lambda and how to test individual components. In this post, I take an example AWS Lambda function using the Python runtime which was not designed for testability. I then incrementally introduce changes that result in an updated version.

Example Scenario

A developer is using their local development environment to code and test an AWS Lambda function. The Python-based Lambda function implements a simple hit counter, using Amazon DynamoDB to persist the counts.

The Python code below represents the first iteration which was not designed for testability. The lambda_handler receives an event object with a string key and optional integer between 1 and 9 (inclusive). If the validation logics passes, it creates or updates a DynamoDB table item, adding the increment value to the hits field, and returns a response with the updated count value.

# "before" version of hitcounter - not designed for testability
import os
import boto3

# run once when the execution environment is created
DYNAMODB_TABLE_NAME = os.environ["DYNAMODB_TABLE_NAME"]
dynamodb_resource = boto3.resource('dynamodb')
dynamodb_table = dynamodb_resource.Table(DYNAMODB_TABLE_NAME)

def lambda_handler(event, context):
    increment = event.get('increment', 1)
    if (increment < 1 or increment > 9):
        return {"count": -1}

    response = dynamodb_table.update_item(
        Key={"PK": event.get('key')},
        UpdateExpression="ADD hits :num",
        ExpressionAttributeValues={
            ":num": increment
        },
        ReturnValues="UPDATED_NEW"
    )

    return {"count": response['Attributes']['hits']}

This code follows the best practices for working with AWS Lambda functions recommendations of using environment variables to pass operational parameters (DYNAMODB_TABLE_NAME) and initializing the DynamoDB connection outside of the function handler for reuse by subsequent invocations processed by the same instance. However, it did not separate the Lambda handler and core logic to make a more unit-testable function. The developer wants to test the full functionality of this Lambda function prior to deployment in AWS, including its interactions with the DynamoDB table.

Defining an Event Schema

An event is a JSON-formatted document that contains data for a Lambda function to process. The Python runtime converts this event into a dictionary object and passes it to the handler. When a Lambda function is invoked synchronously, the function’s response is sent back to the invoker. From inspecting the Python code above, you can infer the event and response structure have the following shape.

// sample event
{
  "key": "SampleEvent",
  "increment": 2
}

// sample response
{
  "count": 2
}

Type annotations are a more formal way to define the event and response structures. The Python typing module provides runtime support for type hints with TypeDict being a special construct which declares a dictionary type. The AWS Lambda Powertools for Python provides a static typing class for LambdaContext and a Parser utility for data validation.

The code below creates Python classes for HitCounterEvent and HitCounterResponse, then annotates the lambda_handler function with them. This is not enforced by the Python runtime, but enables type checking and autocompletion inside IDEs. The HitCounterEvent class inherits from BaseModel which supports decorating of methods to provide validation. The logic that defaults the increment value to 1 and performs a range check between 1 and 9 (inclusive) has moved out of lambda_handler and into HitCounterEvent.

from typing import TypedDict
from aws_lambda_powertools.utilities.parser import BaseModel, parse, validator
from aws_lambda_powertools.utilities.typing import LambdaContext

# new class definitions for Event and Response
class HitCounterEvent(BaseModel):
    key: str
    increment: int = 1

    @validator("increment")
    def set_increment(cls, v):
        if (v < 1 or v > 9):
            raise ValueError("increment must be between 1-9 (inclusive)")
        return v

class HitCounterResponse(TypedDict):
    count: int

def lambda_handler(event: HitCounterEvent, context: LambdaContext) -> HitCounterResponse:
    # function body from above returning result

First Unit Tests

By using well-defined event and response objects, writing unit tests is simplified by not searching for dictionary key string values. The @validator decorated method is executed when constructing HitCounterEvent objects. The tests below use the unittest unit testing framework which look similar to other testing frameworks from other languages.

from unittest import TestCase
from src.hitcounter2.app import (HitCounterEvent, HitCounterResponse, lambda_handler)

class TestHitCounterEventValidation(TestCase):
    def test_get_increment_default(self):
        event = HitCounterEvent(key="test")
        self.assertEqual(event.increment, 1)

    def test_get_increment_2(self):
        event = HitCounterEvent(key="test", increment=2)
        self.assertEqual(event.increment, 2)

    def test_get_increment_0(self):
        with self.assertRaises(ValueError):
            event = HitCounterEvent(key="test", increment=0)

Next Unit Test

The Lambda runtime doesn’t know about or construct HitCounterEvent, but as mentioned earlier, deserializes JSON into a dictionary object. The lambda_handler needs to explicitly call parse with the event object and specify the HitCounterEvent model. The Lambda handler is still responsible for handling ValueError and returning a HitCounterResponse.

def lambda_handler(event: HitCounterEvent, context: LambdaContext) -> HitCounterResponse:
    try:
        event = parse(event=event, model=HitCounterEvent)
    except ValueError:
        return HitCounterResponse(count=-1)
        
    # remaining function body from above returning result

The test below creates a dict with the increment value of 0 and calls the lambda_handler . Since this is outside of the allowed range, the test expects validation failure and asserts the HitCounterResponse has a count of -1.

    def test_lambda_handler_increment_0(self):
        test_event_dict = {"key": "HandlerTest", "increment": 0}
        test_response: HitCounterResponse = lambda_handler(
            event=test_event_dict, context=None)
        self.assertEqual(test_response["count"], -1)

Preparing for Mocking

The original code unconditionally creates DynamoDB resources, causing challenges for mocking and patching inside unit tests. To continue the best practice to create once and reuse, I declare a global variable named _SERVICES and initially set to None. I also create a Services class to hold the set of AWS resource, using resource names as incoming parameters to remove the dependency on specific environment variable names. Finding one of the defined runtime environment variables indicates this code is running inside a Lambda runtime, and a Services object should be constructed.

# define a Services class for AWS service resources
class Services:
    def __init__(self, dynamodb_table_name: str):
        self.dynamodb_resource = boto3.resource('dynamodb')
        self.dynamodb_table = self.dynamodb_resource.Table(
            dynamodb_table_name)

_SERVICES: Services = None
if os.getenv("AWS_LAMBDA_FUNCTION_NAME"):
    # only initialize _SERVICES under Lambda runtime
    _SERVICES = Services(dynamodb_table_name=os.getenv("DYNAMODB_TABLE_NAME"))

The code which updates DynamoDB items is extracted into an update_hits function, taking a Services object parameter rather than directly accessing the global variable. This refactoring enables both mocking for unit tests and independence for future integration tests.

def update_hits(services: Services, key: str, increment: int) -> int:
    response = services.dynamodb_table.update_item(
        Key={"PK": key},
        UpdateExpression="ADD hits :num",
        ExpressionAttributeValues={
            ":num": increment
        },
        ReturnValues="UPDATED_NEW")

    return response["Attributes"]["hits"]

The updated lambda_handler now calls parse and explicitly passes _SERVICES lazy into update_hits.

def lambda_handler(event: HitCounterEvent, context: LambdaContext) -> HitCounterResponse:

    try:
        event = parse(event=event, model=HitCounterEvent)
    except ValueError:
        return HitCounterResponse(count=-1)

    # explicitly pass _SERVICES into function
    hits = update_hits(
        services=_SERVICES, key=event.key, increment=event.increment)

    return HitCounterResponse(count=hits)

Unit Testing with Mock Amazon DynamoDB

During normal execution, the Lambda function code accesses AWS resources such as a DynamoDB table. When running tests, I want to test interactions with these services without accidentally changing live data in DynamoDB. I may also want to run tests in an environment without a connection to AWS.

This solution uses the Moto third-party library to mock out AWS services, replacing them with simulated versions running locally. Instead of using the live AWS service, DynamoDB is mocked and simulated for the entire test class using the moto.mock_dynamodb decorator. This both replaces the boto3 dynamodb resource with a mock object storing data in memory and enables assertions about how that object has been used. In the TestCase below, the setUp and tearDown methods construct and remove a mocked DynamoDB table and manipulate sys.modules to set the _SERVICES global to an instances of the Services class. Also note there is no dependency on setting specific environment variables.

import moto

@moto.mock_dynamodb
class TestHitCounterLambdaDynamoDB(TestCase):
    # Test Set up
    def setUp(self) -> None:
        # Set up the services: construct a (mocked!) DynamoDB table
        self.DYNAMODB_TABLE_NAME = "unit_test_ddb"
        dynamodb = boto3.resource("dynamodb", region_name="us-east-1")
        dynamodb.create_table(
            TableName=self.DYNAMODB_TABLE_NAME,
            KeySchema=[{"AttributeName": "PK", "KeyType": "HASH"}],
            AttributeDefinitions=[
                {"AttributeName": "PK", "AttributeType": "S"}],
            BillingMode="PAY_PER_REQUEST")

        # set the _SERVICES global
        sys.modules["src.hitcounter2.app"]._SERVICES = Services(
            dynamodb_table_name=self.DYNAMODB_TABLE_NAME)
            
    # tests go here

    def tearDown(self):
        # [12] Remove (mocked!) DynamoDB Table
        dynamodb = boto3.client("dynamodb", region_name="us-east-1")
        dynamodb.delete_table(TableName=self.DYNAMODB_TABLE_NAME)

        sys.modules["src.hitcounter2.app"]._SERVICES = None

The following code tests the entire Lambda function at once, including the lambda_handler and all the extracted functions. It passes because moto.mock_dynamodb implements both the DynamoDB resource creation and the update item logic. This is more course-grained than an ideal unit test and also relies on the moto emulation of DynamoDB.

For the test event data, a good practice is saving them with as separate JSON files in your project rather than hard coding them inline. The test code also introduces a load_test_event helper method which allows test_event to be loaded from the test data location. The tests/events/SampleEvent.json file used in the test below contains the same sample event from earlier in this blog. Externalizing test data info files also scales out test coverage without repeating similar test methods.

    def load_test_event(self, test_event_file_name: str) -> HitCounterEvent:
        return HitCounterEvent.parse_file(f"tests/events/{test_event_file_name}.json")

    def test_lambda_handler(self):
        # invoke lambda_handler
        test_event: HitCounterEvent = self.load_test_event("SampleEvent")
        test_response: HitCounterResponse = lambda_handler(
            event=test_event, context=None)

        self.assertEqual(test_response["count"], 2)

The update_hits method can be tested directly. This realizes the value of passing in the Services object instead of taking a dependency on a Global. This test doesn’t depend on lambda_handler logic, but still uses mock_dynamodb.

    def test_lambda_handler_update_hits(self):
        # call update_hits with explicit Services object
        services = Services(dynamodb_table_name=self.DYNAMODB_TABLE_NAME)
        hits = update_hits(
            services=services, key="update_hits", increment=3)
        self.assertEqual(hits, 3)

With update_hits separately tested, this enables internal testing of lambda_handler logic independent of update_hits behaviors. The code below uses patch decorators and a MagicMock object from unittest.mock. This sets a return_value for the function call and use assert_called_once_with to inspect the incoming parameter values. The test still calls lambda_handler, but two implicit dependencies are removed: 1) Mocking _SERVICES with a sentinel removes all dependencies on dynamodb resources ; and 2) Mocking update_hits still provides a return value even when the moto.mock_dynamodb implementation logic emulating DynamoDB update expressions was removed.

    # Patch _SERVICES attribute and update_hits function call
    @patch("src.hitcounter2.app._SERVICES", sentinel.attribute)
    @patch("src.hitcounter2.app.update_hits")
    def test_lambda_handler_mocks(self,
                                  mock_update_hits: MagicMock):
        # Test setup - Return mocked data
        mock_update_hits.return_value = 2

        # invoke lambda_handler
        test_event: HitCounterEvent = self.load_test_event("SampleEvent")
        test_response: HitCounterResponse = lambda_handler(
            event=test_event, context=None)

        # Validate function called with parameter values
        mock_update_hits.assert_called_once_with(services=sentinel.attribute,
                                                 key=test_event.key,
                                                 increment=test_event.increment)

        self.assertEqual(test_response["count"], 2)

Running the Unit Tests

Many IDEs and other tools support running Python tests based on unitest, including the coverage tool for measuring code coverage. An example of running this command is below and will provide a OK or FAIL status, indicating if all the tests pass.

coverage run -m unittest discover

The status can be used to determine if the code should then be deployed to AWS. Tests should be run both in your local development environment and as part of your deployment pipeline. For example, using AWS CodePipeline, you could run these tests in an AWS CodeBuild stage, which would abort any subsequent deployment steps if the tests did not pass.

Local Integration Testing using Test Events

The AWS SAM CLI makes it easy to create and manage serverless applications. You can run an integration test with the Lambda function code running in local docker container using the sam local invoke command and integrate with a deployed DynamoDB table. In the example below, env.json is an environment variable file containing the value for DYNAMODB_TABLE_NAME and tests/events/SampleEvent.json is the same JSON file as the unit test.

sam local invoke \
  --env-vars env.json \
  --event tests/events/SampleEvent.json \
  HitCounter2Function

The AWS Command Line Interface (AWS CLI) has a command to invoke a Lambda function. In the example below, the deployed Lambda function name is specified, the payload uses the same tests/events/SampleEvent.json from earlier, and the response is saved to a SampleResponse.json file.

aws lambda invoke \
  --cli-binary-format raw-in-base64-out \
  --function-name stackname-HitCounter2Function-XXXXXXXX \
  --payload file://tests/events/SampleEvent.json \
  SampleResponse.json

Testing Lambda functions from the AWS console is possible using either private test events or sharable test events. Lambda saves shareable test events as schemas in an Amazon EventBridge (CloudWatch Events) schema registry named lambda-testevent-schemas. Sharable test events can be created either in the console or via a AWS::EventSchemas::Schema CloudFormation resource.