Serverless Architecture

PuffinDB has a radical serverless and cloud-native architecture. Deployment on "private clouds" is not a priority.

Core Principles

Do as much as possible with serverless functions (AWS Lambda).
Do as much as possible of the remaining parts with serverless containers (AWS Fargates).
Do the last bits with a single server-based container (Monostore) vwith as much capacity as possible (Amazon EC2).
Cache data in memory as aggressively as possible.
Use an auto-scaling Redis cluster for synchronization (submillisecond transactions, millions of transactions per second).
Use NAT hole punching for data shuffles.

Why Serverless?

The largest on-demand Amazon EC2 instance (u-24tb1.112xlarge) has 448 vCPUs, 24 TB of RAM, and 100 Gbps of network bandwidth. In comparison, 10,000 AWS Lambda functions offer an aggregated 60,000 vCPUs (134×), 200 TB of RAM (8×), and 8 Tbps of actual network bandwidth (80×). Furthermore, EC2 instances are billed from instantiation to termination (usually several hours at a time), while Lambda functions are billed by the millisecond, and only for the time during which they are actually used. As a result, a true serverless architecture can offer one to two orders of magnitude higher performance, for one to two orders of magnitude lower costs.

Serverless Components

PuffinDB is architected around the following serverless components:

Catalog — Java serverless function packaging Iceberg's Java API
Engine — Bun serverless function packaging the query handler, query planner, query engine, and CPython runtime
Metastore for managing the metadata of tables
Amazon Athena for executing write queries on lakehouse tables (eventually replaced by Icecap)
Amazon ElastiCache for Redis for logging, queuing, and synchronization
Amazon S3 for object storage
Amazon CloudFront for cached query result distribution

Note: Technically-speaking, Amazon ElastiCache for Redis is not serverless, yet is a fully managed service.

CloudFormation Templates

These components are packaged into a pair of complementary AWS CloudFormation templates:

Lakehouse Template — usually instantiated once
Engine Template — usually instantiated multiple times

Clientless Interface

PuffinDB has a clientless architecture and can be used from any application embedding the DuckDB engine, using a simple extension.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serverless.md

Serverless.md

Serverless Architecture

Core Principles

Why Serverless?

Serverless Components

CloudFormation Templates

Clientless Interface

Files

Serverless.md

Latest commit

History

Serverless.md

File metadata and controls

Serverless Architecture

Core Principles

Why Serverless?

Serverless Components

CloudFormation Templates

Clientless Interface