[Sandbox] SlateDB #114

criccomini · 2024-08-08T20:43:18Z

Application contact emails

Project Summary

A cloud native embedded storage engine built on object storage.

Project Description

SlateDB is an embedded storage engine built as a log-structured merge-tree. Unlike traditional LSM-tree storage engines, SlateDB writes data to object storage (S3, GCS, ABS, MinIO, Tigris, and so on). Leveraging object storage allows SlateDB to provide bottomless storage capacity, high durability, and easy replication. The trade-off is that object storage has a higher latency and higher API cost than local disk.

To mitigate high write API costs (PUTs), SlateDB batches writes. Rather than writing every put() call to object storage, MemTables are flushed periodically to object storage as a string-sorted table (SST). The flush interval is configurable.

To mitigate write latency, SlateDB provides an async put method. Clients that prefer strong durability can await on put until the MemTable is flushed to object storage (trading latency for durability). Clients that prefer lower latency can simply ignore the future returned by put.

To mitigate read latency and read API costs (GETs), SlateDB uses standard LSM-tree caching techniques: in-memory block caches, compression, bloom filters, and local SST disk caches.

Org repo URL (provide if all repos under the org are in scope of the application)

https://github.com/slatedb

Project repo URL in scope of application

N/A

Additional repos in scope of the application

No response

Website URL

https://slatedb.io

Roadmap

https://github.com/slatedb/slatedb/milestones

Roadmap context

SlateDB is a very young project. We don't have a roadmap with specific timelines and dependencies. Instead, we've been using milestones to manage project work.

Contributing Guide

https://github.com/slatedb/slatedb/blob/main/CONTRIBUTING.md

Code of Conduct (CoC)

https://github.com/slatedb/slatedb/blob/main/CODE_OF_CONDUCT.md

Adopters

No response

Contributing or Sponsoring Org

No sponsors. We have contributors from:

Responsive
Jamsocket
Databend
Microsoft

Some of these contributors are participating on personal time and others through their work.

Maintainers file

https://github.com/slatedb/slatedb/blob/main/MAINTAINERS.md

IP Policy

If the project is accepted, I agree the project will follow the CNCF IP Policy

Trademark and accounts

If the project is accepted, I agree to donate all project trademarks and accounts to the CNCF

Why CNCF?

I wanted an independent foundation to own the code, trademarks, and so on for SlateDB. I also wanted a foundation to signal that we are adhering to a common-sense, standard project management style. We have multiple companies working on the project, so governance is important. I would love to get some guidance there, as well.

Lastly, I wanted a foundation that didn't insist on antiquated infrastructure such as JIRA.

Benefit to the Landscape

There has been a lot of interest in SlateDB. I wrote a post about the idea initially. Since then, early interest has come from the streaming and durable execution community.

Most adopters previously looked at rocksdb-cloud, but found the lack of documentation and support a bit of a show stopper. Additionally, RocksDB's write-ahead log isn't integrated into object storage, which means it's still stateful. SlateDB, by contrast, allows a completely stateless deployment since all state is persisted in object storage.

SlateDB fits well for systems that are OK with 20-100ms of write latency, but want high durability and easy operations.

Cloud Native 'Fit'

SlateDB is by definition cloud native. It can't run without an object store such as S3, GCS, or ABS. In addition to depending on cloud-native infrastructure, it's also meant to power cloud native infrastructure. We've seen a lot of need for a cloud-native LSM. Everything from vector search (Turbopuffer) to Kafka (WarpStream) to an etcd replacement (https://github.com/k3s-io/kine) could leverage SlateDB. Durable execution, serverless functions, and stream processing are our initial use cases.

Cloud Native 'Integration'

No response

Cloud Native Overlap

No response

Similar projects

The only similar project I'm familiar with is RocksDB-Cloud. See here for a description of how RocksDB-cloud and SlateDB are different.

There are other projects that are taking the same zero-disk architecture approach, but are not targeting LSM/row-based KV lookups. One such example is Tonbo; they're focused on columnar storage formats and OLAP use cases. By contrast, we're focused on row-based storage formats and OLTP use cases.

Landscape

No

Business Product or Service to Project separation

The only adopting company currently contributing is Responsive.dev. They are a streaming company using SlateDB for state management. SlateDB is run entirely separately and we have adopted an ICLA.

Project presentations

We have a p99conf presentation coming up in September 2024.

Project champions

No response

Additional information

~~The main github repo (https://github.com/slatedb/slatedb) is not currently public. We are opening it up on ~August 19. Please contact me if you need early access.~~

~~The website is still a work in progress. It's got a lot of Docusaurus boilerplate. Planning to clean that up and write some docs in the next week or two.~~

criccomini · 2024-08-14T00:48:58Z

Note: SlateDB is now open source and the github repository is publicly available: https://github.com/slatedb/slatedb

TheFoxAtWork · 2024-09-26T19:39:53Z

@chira001 @raffaelespazzoli @xing-yang Does the TAG have a recommendation regarding this project?

TheFoxAtWork · 2024-09-26T19:43:01Z

@criccomini would you complete the cloud native fit section of the application?

raffaelespazzoli · 2024-09-26T20:00:30Z

@TheFoxAtWork I'll be the one reviewing the project. I don't have a recommendation yet.

criccomini · 2024-09-26T21:09:04Z

@criccomini would you complete the cloud native fit section of the application?

Took a shot at this. Let me know if you need more. :)

raffaelespazzoli · 2024-09-27T10:47:19Z

@criccomini would you or someone on behalf of slateDB be able to present at TAG storage? I will contact you on linkedin so we can exchange contacts.

criccomini · 2024-09-27T20:00:47Z

(We got in touch over email, and have set a time for Oct 23, 8AM pacific)

criccomini added the New New Application label Aug 8, 2024

angellk added the Storage label Aug 20, 2024

mrbobbytables added this to the October Sandbox Review Items milestone Aug 29, 2024

angellk added the review/tag/assigned label Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Sandbox] SlateDB #114

[Sandbox] SlateDB #114

criccomini commented Aug 8, 2024 •

edited

Loading

criccomini commented Aug 14, 2024

TheFoxAtWork commented Sep 26, 2024

TheFoxAtWork commented Sep 26, 2024

raffaelespazzoli commented Sep 26, 2024

criccomini commented Sep 26, 2024

raffaelespazzoli commented Sep 27, 2024

criccomini commented Sep 27, 2024

[Sandbox] SlateDB #114

[Sandbox] SlateDB #114

Comments

criccomini commented Aug 8, 2024 • edited Loading

Application contact emails

Project Summary

Project Description

Org repo URL (provide if all repos under the org are in scope of the application)

Project repo URL in scope of application

Additional repos in scope of the application

Website URL

Roadmap

Roadmap context

Contributing Guide

Code of Conduct (CoC)

Adopters

Contributing or Sponsoring Org

Maintainers file

IP Policy

Trademark and accounts

Why CNCF?

Benefit to the Landscape

Cloud Native 'Fit'

Cloud Native 'Integration'

Cloud Native Overlap

Similar projects

Landscape

Business Product or Service to Project separation

Project presentations

Project champions

Additional information

criccomini commented Aug 14, 2024

TheFoxAtWork commented Sep 26, 2024

TheFoxAtWork commented Sep 26, 2024

raffaelespazzoli commented Sep 26, 2024

criccomini commented Sep 26, 2024

raffaelespazzoli commented Sep 27, 2024

criccomini commented Sep 27, 2024

criccomini commented Aug 8, 2024 •

edited

Loading