Name	Name	Last commit message	Last commit date
parent directory ..
app	app
.gitignore	.gitignore
README.md	README.md
__init__.py	__init__.py
template.yaml	template.yaml

FastAPI Response Streaming with Lambda Managed Instances

This example shows how to use Lambda Web Adapter to run a FastAPI application with response streaming on Lambda Managed Instances (LMI).

Lambda Managed Instances allows a single Lambda execution environment to handle multiple concurrent requests, improving throughput and reducing costs for I/O-bound workloads like streaming responses.

Prerequisites

Lambda Managed Instances requires a VPC with:

At least one subnet (two subnets recommended)
A security group that allows outbound traffic

If you don't have a VPC configured, you can use the default VPC or create one.

How does it work?

This example combines three Lambda features:

Lambda Web Adapter - Runs your FastAPI app on Lambda without code changes
Response Streaming - Streams responses back to clients as they're generated
Lambda Managed Instances - Handles multiple concurrent requests per execution environment

Key Configuration

LMICapacityProvider:
  Type: AWS::Serverless::CapacityProvider
  Properties:
    CapacityProviderName: !Sub "${AWS::StackName}-capacity-provider"
    VpcConfig:
      SubnetIds: !Ref SubnetIds
      SecurityGroupIds: !Ref SecurityGroupIds
    ScalingConfig:
      MaxVCpuCount: 20
      AverageCPUUtilization: 70.0

FastAPIFunction:
  Type: AWS::Serverless::Function
  Properties:
    CodeUri: app/
    Handler: run.sh
    Runtime: python3.13
    MemorySize: 2048
    Environment:
      Variables:
        AWS_LAMBDA_EXEC_WRAPPER: /opt/bootstrap
        AWS_LWA_INVOKE_MODE: response_stream
        PORT: 8000
    Layers:
      - !Sub arn:aws:lambda:${AWS::Region}:753240598075:layer:LambdaAdapterLayerX86:26
    CapacityProviderConfig:
      Arn: !GetAtt LMICapacityProvider.Arn
      PerExecutionEnvironmentMaxConcurrency: 64
    FunctionUrlConfig:
      AuthType: NONE
      InvokeMode: RESPONSE_STREAM

AWS::Serverless::CapacityProvider - Creates the LMI capacity provider with VPC configuration
CapacityProviderConfig.Arn - References the capacity provider
CapacityProviderConfig.PerExecutionEnvironmentMaxConcurrency: 64 - Up to 64 concurrent requests per instance
AWS_LWA_INVOKE_MODE: response_stream - Configures Lambda Web Adapter for streaming
FunctionUrlConfig.InvokeMode: RESPONSE_STREAM - Enables streaming on the Function URL

Build and Deploy

First, get your VPC subnet and security group IDs:

# List subnets in your default VPC
aws ec2 describe-subnets --filters "Name=default-for-az,Values=true" \
  --query 'Subnets[*].[SubnetId,AvailabilityZone]' --output table

# List security groups
aws ec2 describe-security-groups --filters "Name=group-name,Values=default" \
  --query 'SecurityGroups[*].[GroupId,GroupName]' --output table

Build and deploy:

sam build --use-container
sam deploy --guided

During guided deployment, you'll be prompted for:

SubnetIds - Comma-separated list of subnet IDs (e.g., subnet-abc123,subnet-def456)
SecurityGroupIds - Comma-separated list of security group IDs (e.g., sg-abc123)

Verify it works

Open the Function URL in a browser. You should see a message stream back character by character, with a unique request ID prefix like [a1b2c3d4] This is streaming from Lambda Managed Instances!.

Test concurrent requests

To verify LMI is working, send multiple concurrent requests:

# Get your function URL
URL=$(aws cloudformation describe-stacks --stack-name fastapi-response-streaming-lmi \
  --query 'Stacks[0].Outputs[?OutputKey==`FastAPIFunctionUrl`].OutputValue' --output text)

# Send 10 concurrent requests
for i in {1..10}; do curl -s "$URL" & done; wait

Each response will have a different request ID, but they may share the same execution environment (visible in CloudWatch logs).

Considerations

When using LMI with streaming:

VPC: LMI requires VPC configuration. Ensure your subnets have internet access (via NAT Gateway) if your function needs to call external services
Shared state: FastAPI/Uvicorn handles concurrency natively, but avoid mutable global state
Memory: With 64 concurrent requests, ensure sufficient memory (2048MB in this example)
Timeouts: Streaming responses can run up to 15 minutes with Function URLs
Scaling: MaxVCpuCount controls the maximum vCPUs the capacity provider can provision across all instances

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

FastAPI Response Streaming with Lambda Managed Instances

Prerequisites

How does it work?

Key Configuration

Build and Deploy

Verify it works

Test concurrent requests

Considerations

FilesExpand file tree

fastapi-response-streaming-lmi

Directory actions

More options

Directory actions

More options

Latest commit

History

fastapi-response-streaming-lmi

Folders and files

parent directory

README.md

FastAPI Response Streaming with Lambda Managed Instances

Prerequisites

How does it work?

Key Configuration

Build and Deploy

Verify it works

Test concurrent requests

Considerations