Skip to content

Latest commit

 

History

History
45 lines (31 loc) · 3.13 KB

Icecap.md

File metadata and controls

45 lines (31 loc) · 3.13 KB

Icecap

Icecap is a proposed implementation of Iceberg tables using Redis and DuckDB as an alternative to Spark SQL.

Overview

  • Optimized for low latency
  • Supporting updates in place
  • Powered by serverless functions, Redis, and DuckDB (sans Spark)

Redis

  • Used as table catalog and transactional orchestrator allowing multiple DuckDB engines to read | write the same tables
  • Accelerated with Dragonfly or KeyDB (optional)

Updates

Object Stores like Amazon S3 do not currenlty support updates in place. Therefore, a serverless function must GET an entire object before applying updates to it and before it can be PUT back on the Object Store. Nevertheless, if the object uses a file format natively designed to support updates in place (such as DuckDB's native file format), this process can be accelerated. Furthermore, the serverless function can cache the object on its local filesystem, thereby allowing updates in place during the object's caching lifespan.

Down the road, we hope that Object Stores will add native support for updates in place when using certain file formats such as DuckDB's.

In the meantime, updates will be managed in the following fashion:

  1. Table updates buffered on Redis
  2. Partitions of tables loaded from object store and cached on serverless functions
  3. Updates applied in place by serverless functions
  4. Partitions serialized back onto object store

According to this model, the DuckDB file format could be used on both object store and serverless functions, or just the latter.

File Format Replication

Icecap will make it possible to replicate every partition stored on the Object Store across multiple file formats. For example, the same partition could be stored in both DuckDB and Parquet formats. This will allow any Parquet-compatible tool to query tables, while making it faster for Icecap to update tables by leveraging the fact that DuckDB's native file format supports updates in place. Considering that storage costs on the Object Store usually represents a tiny fraction of the overall cost of operating a data lake, this transparent replication will probably be attractive to many organizations.

FAQ

Why not use Spark SQL?
Because it is too slow and too expensive to deploy and operate.

Will Icecap support the Parquet file format?
Yes. Icecap will support any file format supported by Apache Iceberg, alongside the native DuckDB file format for updates in place.

Will Icecap support the Iceberg table format?
Yes, Iceberg will support both the Iceberg and Delta Lake table formats (not to be confused with file formats).

Which file formats will be supported in the underlying Object Store?
All of them. The file format war is not ours to fight. Users will use whichever formats they want.

Which file formats will be supported in cache?
All of them.