PuffinDB uses a dual strategy for supporting hybrid transactional/analytical processing (HTAP) with lakehouse tables:
- Amazon Athena for low-frequency | high-latency updates (default)
- DuckDB and Amazon EBS for high-frequency | low-latency updates
By default, updates on lakehouse tables are made using Amazon Athena (Cf. Updating Iceberg table data). But for higher performance (more IOPS and lower latency), a lakehouse table can be transiently copied to an Amazon EBS volume mounted onto the Monostore, using the native DuckDB file format. When doing so, the table is locked on the lakehouse, and all read | write transactions are performed on Amazon EBS by the DuckDB engine running on the Monostore. This transient copy of a table from the lakehouse to the block storage is usually done for the duration of an HTAP session lasting for a few hours. The lifecycle of such a session is usually mirrored by the Monostore's lifecycle. Once the Monostore is stopped, the table is written back to the lakehouse, then finally unlocked.
- Unified HTAP API for lakehouse tables and block storage tables
- Partition-level copy to block storage (instead of copying an entire table)