Icecap

Icecap is a proposed implementation of Iceberg tables using Redis and DuckDB as an alternative to Spark SQL.

Overview

Optimized for low latency
Supporting updates in place
Powered by serverless functions, Redis, and DuckDB (sans Spark)

Redis

Used as table catalog and transactional orchestrator allowing multiple DuckDB engines to read | write the same tables
Accelerated with Dragonfly or KeyDB (optional)

Updates

Object Stores like Amazon S3 do not currenlty support updates in place. Therefore, a serverless function must GET an entire object before applying updates to it and before it can be PUT back on the Object Store. Nevertheless, if the object uses a file format natively designed to support updates in place (such as DuckDB's native file format), this process can be accelerated. Furthermore, the serverless function can cache the object on its local filesystem, thereby allowing updates in place during the object's caching lifespan.

Down the road, we hope that Object Stores will add native support for updates in place when using certain file formats such as DuckDB's.

In the meantime, updates will be managed in the following fashion:

Table updates buffered on Redis
Partitions of tables loaded from object store and cached on serverless functions
Updates applied in place by serverless functions
Partitions serialized back onto object store

According to this model, the DuckDB file format could be used on both object store and serverless functions, or just the latter.

File Format Replication

Icecap will make it possible to replicate every partition stored on the Object Store across multiple file formats. For example, the same partition could be stored in both DuckDB and Parquet formats. This will allow any Parquet-compatible tool to query tables, while making it faster for Icecap to update tables by leveraging the fact that DuckDB's native file format supports updates in place. Considering that storage costs on the Object Store usually represents a tiny fraction of the overall cost of operating a data lake, this transparent replication will probably be attractive to many organizations.

FAQ

Why not use Spark SQL?
Because it is too slow and too expensive to deploy and operate.

Will Icecap support the Parquet file format?
Yes. Icecap will support any file format supported by Apache Iceberg, alongside the native DuckDB file format for updates in place.

Will Icecap support the Iceberg table format?
Yes, Iceberg will support both the Iceberg and Delta Lake table formats (not to be confused with file formats).

Which file formats will be supported in the underlying Object Store?
All of them. The file format war is not ours to fight. Users will use whichever formats they want.

Which file formats will be supported in cache?
All of them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Icecap.md

Icecap.md

Icecap

Overview

Redis

Updates

File Format Replication

FAQ

Files

Icecap.md

Latest commit

History

Icecap.md

File metadata and controls

Icecap

Overview

Redis

Updates

File Format Replication

FAQ