Ollama Running On AWS Lambda

🚀 Working Example

Note

AWS Lambda uses CPUs, therefore running generate / chat is a little slow.
The first deployment takes ~5m while the container is built and models are cached, subsequent deployments take ~1m.
The first request while the model is loaded takes ~20s, subsequent requests take ~5-20s.
While this is not production grade, it is a cost effective way to serve models.

curl https://wm4s6cxkwua4ncx3skpdtdx27a0qzbnd.lambda-url.us-east-1.on.aws/api/generate -d '{
  "model": "llama3.2:1b",
  "prompt":"Why is the sky blue?"
}'

🙏 Please, please, please don't abuse this endpoint, Scaffoldly is Open Source (a.k.a. cash strapped 🤣) and we're hosting it for demonstration purposes only!
Please consider donating if you like what Scaffoldly is doing!
Check out our other examples
Give our Tooling and Examples repositories a ⭐️ if you like what you see!

✨ Host Your Own!

Tip

To use a different model than llama3.2:1b, update scaffoldly.json with the desired model(s).

Run the following command to create your own copy of this application:

npx scaffoldly create app --template ollama

Create an EFS Filesystem in AWS, give it a Name of .cache (to match scaffoldly.json)
Finally, deploy:

cd my-app
npx scaffoldly deploy

You will see output that looks like:

🟠 App framework not detected. Using `scaffoldly.json` for configuration.

✅ Updated Identity: arn:aws:sts::123456789012:assumed-role/aws-examples@scaffold.ly/cnuss
✅ Updated ECR Repository: 123456789012.dkr.ecr.us-east-1.amazonaws.com/ollama
✅ Updated Local Image Digest: sha256:f7ee27705d66c64a250982d6ee8282d5338a4989ae95c5ac4453a15c264efc97
✅ Updated Secret: arn:aws:secretsmanager:us-east-1:123456789012:secret:ollama@ollama-yaVNCp
✅ Updated EFS Access Point: arn:aws:elasticfilesystem:us-east-1:123456789012:access-point/fsap-0b0e5506324efd541
✅ Updated IAM Role: ollama-0447aaae
✅ Updated IAM Role Policy: ollama
✅ Updated Lambda Function: ollama
✅ Updated Function URL: https://wm4s6cxkwua4ncx3skpdtdx27a0qzbnd.lambda-url.us-east-1.on.aws
✅ Updated Schedule Group: ollama-0447aaae
✅ Updated Local Image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/ollama:0.0.0-0-0447aaae
✅ Updated Local Image Digest: sha256:320447c49d08d109c4fc1702acc24768657a9a09e4e0eb90f8b32051500664ba
✅ Updated Secret: arn:aws:secretsmanager:us-east-1:123456789012:secret:ollama@ollama-yaVNCp
✅ Updated Lambda Function: ollama
✅ Updated Function Code: ollama@sha256:320447c49d08d109c4fc1702acc24768657a9a09e4e0eb90f8b32051500664ba
✅ Updated Function Alias: ollama (version: 4)
✅ Updated Function Policies: InvokeFunctionUrl
✅ Updated Function URL: https://wm4s6cxkwua4ncx3skpdtdx27a0qzbnd.lambda-url.us-east-1.on.aws
✅ Updated Network Interface: eni-0dc0e11444fa19715
✅ Created Invocation of `( HOME=$XDG_CACHE_HOME OLLAMA_HOST=$URL ollama pull llama3.2:1b )`:
pulling manifest
   ==> pulling 74701a8c35f6... 100% ▕████████████████▏ 1.3 GB
   ==> pulling 966de95ca8a6... 100% ▕████████████████▏ 1.4 KB
   ==> pulling fcc5a6bec9da... 100% ▕████████████████▏ 7.7 KB
   ==> pulling a70ff7e570d9... 100% ▕████████████████▏ 6.0 KB
   ==> pulling 4f659a1e86d7... 100% ▕████████████████▏  485 B
   ==> verifying sha256 digest
   ==> writing manifest
   ==> success
✅ Updated HTTP GET on https://wm4s6cx...s-east-1.on.aws: 200 OK

🚀 Deployment Complete!
   🆔 App Identity: arn:aws:iam::123456789012:role/ollama-0447aaae
   📄 Env Files: .env.ollama, .env.main, .env
   📦 Image Size: 4.81 GB
   🌎 URL: https://wm4s6cxkwua4ncx3skpdtdx27a0qzbnd.lambda-url.us-east-1.on.aws

🤨 How It Works

The scaffoldly.json is converted into a Multi-Stage Docker Build
A docker build is pushed to Amazon ECR
A Lambda Function is created to serve the image
Models are cached to Amazon EFS
Requests are proxied to the underlying Ollama server

Tip

This repoistory also comes with a GitHub Action so that deployments can occur from GitHub instead of being executed manually!

Multi-Stage Docker Build

After the project has been created, run npx scaffoldly show dockerfile to see the resultant Dockerfile:

FROM ollama/ollama:0.4.7 AS install-base
WORKDIR /var/task

FROM install-base AS build-base
WORKDIR /var/task
ENV PATH="/var/task:$PATH"
COPY . /var/task/

FROM install-base AS package-base
WORKDIR /var/task
ENV PATH="/var/task:$PATH"

FROM install-base AS runtime
WORKDIR /var/task
ENV PATH="/var/task:$PATH"
COPY --from=scaffoldly/scaffoldly:1 /linux/arm64/awslambda-entrypoint /var/task/.entrypoint
CMD [ "( HOME=$XDG_CACHE_HOME ollama serve )" ]