Skip to content

Latest commit

 

History

History
72 lines (49 loc) · 2.99 KB

README.md

File metadata and controls

72 lines (49 loc) · 2.99 KB

MerkleDB

CircleCI codecov core docs spark docs tools docs

MerkleDB is a Clojure library for storing and accessing large data sets in a hybrid column-oriented tree of content-adressable data blocks.

This project is usable, but should be considered alpha quality. For more details, see the design doc, proposed client interface, and sample usage patterns.

Installation

Library releases are published on Clojars. To use the latest version with Leiningen, add the following dependency to your project definition:

Clojars Project

This will pull in the omnibus package, which in turn depends on each subproject of the same version. You may instead depend on the subprojects directly if you wish to omit some functionality, such as Spark integration.

Concepts

The high-level semantics of this library are similar to a traditional key-value data store:

  • A database is a collection of tables, along with some user metadata.
  • Tables are collections of records, which are identified uniquely within the table by an id key.
  • Each record is an associative collection of fields, mapping field names to values.
  • Values may have any type that the underlying serialization format supports. There is no guarantee that all the values for a given field have the same type.

Goals

The primary design goals of MerkleDB are:

  • Flexible schema-free key-value storage.
  • High-parallelism reads and writes to optimize for bulk-processing, where a job computes over most or all of the records in the table, but possibly only needs access to a subset of the fields in each record.

Secondary goals include:

  • Efficient storage utilization via deduplication and structural sharing.
  • Light-weight versioning and copy-on-write to support immutable reads.
  • Building on storage and synchronization abstractions to support hosted service backends.

Non-goals:

  • High-frequency, highly concurrent writes. Initial versions will have simple database-wide locking for updates.
  • Access control. In this library, all authentication and authorization is deferred to the storage layers backing the block store and ref manager.

License

This is free and unencumbered software released into the public domain. See the UNLICENSE file for more information.