A Lambda function that provides OpenAI-compatible API for SageMaker endpoints, enabling seamless integration with OpenAI SDK while using your own deployed models.
Client (OpenAI SDK) → Lambda Function URL → SageMaker Endpoint
(OpenAI Adapter) (vLLM Service)
aws-lambda-response-streaming/
├── src/ # Source code
│ ├── main.py # Lambda function main code
│ ├── data_type.py # Data type definitions
│ └── requirements.txt # Python dependencies
├── tests/ # Test files
│ └── test_comprehensive.py
├── docs/ # Documentation
│ ├── USAGE.md # Usage guide
│ ├── prompt.md # Requirements document
│ └── imgs/ # Images
├── Dockerfile # Container build file
├── template.yml # SAM template
├── samconfig.toml # SAM configuration
├── .gitignore
├── README.md
└── LICENSE
sam build --use-container
sam deploy --guidedpython tests/test_comprehensive.pyfrom openai import OpenAI
client = OpenAI(
api_key="dummy-key",
base_url="https://your-lambda-url.lambda-url.us-west-2.on.aws/v1"
)
# Streaming
stream = client.chat.completions.create(
model="gemma-2-2b-it",
messages=[{"role": "user", "content": "Hello"}],
stream=True,
)
# Non-streaming
response = client.chat.completions.create(
model="gemma-2-2b-it",
messages=[{"role": "user", "content": "Hello"}],
)- Usage Guide - Detailed usage instructions
- Requirements - Original requirements document
- ✅ Full OpenAI API compatibility
- ✅ Real streaming responses
- ✅ SageMaker endpoint integration
- ✅ Lambda Web Adapter for performance
- ✅ Comprehensive error handling
The project includes comprehensive tests that verify:
- OpenAI SDK compatibility
- Streaming and non-streaming responses
- Performance benchmarks
- Parameter handling
Run tests with: python tests/test_comprehensive.py