🏗️🚧👷👷🏽👷️🚧 see todos inl
Proof of Concept that takes 4 minutes to demonstrate
An enterprise grade Hybrid Transactional Analytical Processing (HTAP) Data Platform
with the following key characteristics
- Record of Truth: simplifies data governance, security, catalogue and schema management
- OLTP store with high concurrency + horizontal scaling
- Strict-serializable ACID transactions
- Different SQL interfaces: OLAP SparkSQL & Presto, OLTP Postgres wire-protocol + dialect (subset PoC),
- OLAP resource isolation without data duplication from OLTP
- Ready for today's Agentic AI demands
- Ecosystem integration: Apache Kafka, Apache Spark, Presto, Apache Parquet, Apache Iceberg
- Freedom to Operate: built with commoditized software not controlled by vendors, that can be deployed anywhere
- 80%+ Lower Total Cost of Ownership compared to common OLTP + ETL + OLAP solutions
- Demo Quick Start
- Vision Statement (the "why")
- Appendix: Scope, Architecture Notes, and Enterprise Considerations
- FAQ (hard question and direct answers)
Check machine memory has at least 12 GB of memory allocated.
podman machine inspect --format "{{.Resources.Memory}}" # must be greater than 12287 (12GB)See docs/TROUBSHOOTING.md for how to increase the memory limit.
podman compose -f podman-compose.yml uppodman exec cassandra \
cqlsh cassandra -e "SELECT * FROM demo.events LIMIT 3;"TODO– quick explaination of data model used
podman exec presto \
presto-cli --execute "SHOW SCHEMAS FROM cassandra;"
podman exec presto \
presto-cli --execute "SELECT * FROM cassandra.demo.events LIMIT 100;"
podman exec presto \
presto-cli --execute "SELECT entity_id, COUNT(*) FROM cassandra.demo.events GROUP BY entity_id LIMIT 10;"To watch query progress in browser, open http://localhost:8088/ui/
A simple query using the Cassandra-Spark-Connector (requires creating a temp view first)
podman exec -it spark \
spark-sql --packages com.datastax.spark:spark-cassandra-connector_2.12:3.5.1 \
--conf spark.cassandra.connection.host=cassandra CREATE TEMPORARY VIEW events_for_partition_queries USING org.apache.spark.sql.cassandra
OPTIONS (keyspace 'demo', table 'events');
SELECT * FROM events_for_partition_queries LIMIT 3;"The spark-cassandra-connector is best for per-partition (or per-index) queries. Reads and writes go through Cassandra's CQL interface and its jvm.
Queries using the Cassandra Spark Bulk Reader (via the Cassandra Sidecar)
podman exec -it spark \
spark-sql \
--packages org.apache.cassandra:cassandra-analytics-core_spark3_2.12:0.4.0-mck0,org.apache.cassandra:analytics-sidecar-vertx-client-all:0.4.0-mck0,org.apache.cassandra:cassandra-bridge_spark3_2.12:0.4.0-mck0CREATE OR REPLACE TEMP VIEW events_for_bulk_queries
USING org.apache.cassandra.spark.sparksql.CassandraDataSource
OPTIONS (
sidecar_contact_points "cassandra",
keyspace "demo",
table "events",
DC "datacenter1",
createSnapshot "true",
snapshotName "htap_demo_sparksql",
numCores "4"
);
SELECT count(*) FROM events_for_bulk_queries;
SELECT entity_id, COUNT(*) AS cnt, MIN(event_time) AS first_seen, MAX(event_time) AS last_seen
FROM events_for_bulk_queries GROUP BY entity_id ORDER BY cnt DESC LIMIT 10;To watch query progress in browser, open http://localhost:4040/
The Cassandra Bulk Reader/Writer interfaces directly to the data directory disks of the Cassandra nodes. Reads are direct to a snapshot of the sstables files directly (off disk), providing point-in-time consistency. Direct file access provides high throughput of bulk or analytics-style read and writes that does not impact latencies of other requests to the Cassandra cluster.
To write to a single Parquet file, continue on from the spark-sql example above:
INSERT OVERWRITE DIRECTORY '/var/lib/cassandra/parquet-exports/demo_events' USING parquet
SELECT /*+ COALESCE(1) */ * FROM events_for_bulk_queries;This writes the entire demo.events table to a single Parquet file in the cassandra-data directory. The coalesce(1) reduces all partitions to one before writing, resulting in a single output file.
FIXME: currently broken w/ 'DecoratedKey… not serializable result: java.nio.HeapByteBuffer'
podman exec -it spark \
spark-shell \
--packages org.apache.cassandra:cassandra-analytics-core_spark3_2.12:0.4.0-mck0,org.apache.cassandra:analytics-sidecar-vertx-client-all:0.4.0-mck0,org.apache.cassandra:cassandra-bridge_spark3_2.12:0.4.0-mck0 \val df = spark.read.parquet("/var/lib/cassandra/parquet-exports/demo_events")
df.write
.format("org.apache.cassandra.spark.sparksql.CassandraDataSink")
.option("sidecar_contact_points", "cassandra")
.option("keyspace", "demo")
.option("table", "events")
.option("DC", "datacenter1")
.option("numCores", "4")
.mode("append")
.save()This writes SSTables directly to disk and uses the Sidecar to load them into Cassandra, bypassing the CQL layer for maximum throughput.
Simple configuration to CDC all writes into the database into a Kafka topic
todo
TODO– see mck/cassandra-6 branch
TODO– see mck/cassandra-6 branch
One Data Platform for all your needs, like the good ol' days of RDMS but no longer the monolith.
In details, the reasons to choose this stack are broken down into the following:
• Architecture – Keep It Simple, Keep It Fast, Keep It Consistent
• AI data platform demands
• Vector Similarity Search
• ACID Guarantees
• SQL Compatibility
• Change Data Capture Kafka Streaming
• Why Presto and Apache Spark
• Why Podman
• Common (Bad) Assumptions: YAGNI, Scale, and the OLAP Platform
• Appendix: Scope, Architecture Notes, and Enterprise Considerations
• Hard Questions FAQ (direct answers)
• References
And, for an illustration of the material reductions in recurring spend on licenses and Saas every month see TCO-Comparisons.md.
Build a unified data platform that can consolidate application/transactional (OLTP), Analytics (OLAP), machine learning and generative AI, workloads in one platform.
Choose a data platform stack that serves different consumption patterns and modalities while keeping the data in place: one record of truth.
This avoids 3 common problems that evolve:
- Data duplication and inconsistency
- ETLs and pipelines that are hard to maintain and slow to run
- Multiple databases that are hard to manage, catalogue, schema version, secure, govern
- Explosion of technology portfolio, operational and software costs
- Siloing of data efforts through the enterprise
The stack demostrates a set of standard technologies put together that is as simple to setup as any single database, and gives you all the same functionality, while future-proofing for AI, ML and Analytics.
Following the principles of record of truth and compute storage separation, the AI, ML and Analytics are compute workloads without separate storage.
By providing direct access to the data, but different access paths for different compute workloads (OLTP, OLAP, ML, AI) to ensure resource management and isolation the data platform is fast and efficient, best serving the SLO needs for each workload (latency or throughput).
Application workloads (OLTP) can expect p99 latencies for writes under 5ms and reads under 50ms, regardless of throughput or concurrency.
AI agents need real-time access to all data sources in an enterprise, with high throughput, strong consistency and low latency, without bypassing security and governance controls.
Today's AI progress is outpacing data readiness. There is no AI without Data– any enterprise's potential with AI agents fundamentally depends on modernisation efforts of the data platform.
The Mobile-First era (~2007) exposed the systems-design mess—and resulting limitations—created by duplicating data stores as data volumes exploded. Many organizations no longer revisit why certain splits were originally necessary, that for example the underlying reason for the denormalisation and duplication of data: from transactional databases to search indexes, and from application OLTP databases to analytical OLAP data lakes; is because of industry technical constraints. But 20 years later we still often choose technologies based solely on comparisons within silos and bounded contexts, forgetting, or unable, to re-evalute if past systems designs limitations still apply. These architectures are slow to be revisited once institutionalized and has led to an explosion in enterprise-wide complexity, limitations, and rapidly increasing costs.
Agentic AI gives us an opportunity to avoid repeating the mistakes of the mobile-first era and to build a data platform that is scalable, consistent, and available—while maintaining low latency, high throughput, and real-time access to the enterprise data catalogue. This also enables centralised, real-time governance and security across the full catalogue.
TODO
Many OLTP databases provide ACID semantics, but Serializability is rare and Strict Serializability is rarer .
As load increases and storage becomes inherently distributed, transactional guarantees matter more. Serializable isolation is rare in production, while Strict Serializability raises the requirements higher. Durability must now also account for single points of failure and multiple simultaneous catastrophic hardware failures, something single (writer) databases fail to do.
SQL—and particularly the Postgres dialect—is a standard skill relied upon by both developers and data scientists.
SQL plays a valuable role early in the application lifecycle, when the domain model and schema change frequently, as well as for data exploration and analysis. While persisting to disk a rigid logical “SQL schema” is an unnecessary performance overhead for applications with prepared and static access patterns, the logical SQL view often simplifies work for both developers and data scientists.
SQL is an interface layer; in this stack it is implemented as a protocol/dialect adapter over the transactional layer. SQL can be implemented on top of many storage engines, using just transactions and an underlying key-value store. This HTAP can easily add Document, Graph and other modalities.
This stack demonstrates how different SQL implementations deliver value for varying throughput, latency, and domain requirements. The stack offers different SQL interfaces for: direct application SQL (a PoC Postgres-shim), and production-ready analytical SQL in both: partition-based (Spark/Presto) and file-direct analytical SQL (Spark/Iceberg/Presto). These different SQL interfaces make it possible to work with and migrate legacy platforms where federation of exisiting data sources is required.
TODO
TODO
Turn database sprawl into something much simpler
Myth: "YAGNI: You don't need to design your database for scale. Redesign for scale later when needed"
Reality: The cost of scaling later is significantly higher than the cost of scaling now, and left too late becomes a business killer. The cost of redesigning critical foundational data systems across an enterprise late takes years, and leaves technical debt in perpituity. Beyond the cost opportunity as business growth is stalled, the cost of scaling late incurs business growth limitations, ir not stagnation, and can lead to the business failure. The cost of scaling now is a sunk cost, with a sunk trade-off primarily in learning new technologies and skills.
Myth: "Analytics is a separate responsibility and platform"
Reality: This assumption often emerges from application-team constraints. Compounded by a variant of the previous myth of "we tackle the analytics needs later when needed". It results in businesses with database sprawl debt and limited engineering competence and application in an isolated silo where data scientists struggle with stale, fragmented and poor quality data on inefficient insufficient solutions leading to excessive manual labour and limited results. Data Mesh makes an attempt to address this in principle but the available tactics today (data fabrics and data products) fail to address the most consequential limitations that can easily and should be addressed, when done early.
The era of Agentic AI forces us to the reality that the data consumption patterns typical of analytical computations is now a critical part of the business, indistinguishable and inseparatable from the transactional application stack, and enterprise data platforms must now be designed accordingly, ideally from the beginning. Developers no longer get to pass it off as not their responsibility.
This repository is a proof-of-concept HTAP stack intended to be easy to run and evaluate. It demonstrates a unified pattern for:
- high-concurrency OLTP ingestion and point/bounded reads
- strict-serializable ACID transactions (transactional tables / operations)
- analytics that can operate over persisted structures (including snapshots) without requiring an always-on ETL pipeline
Status: Not GA. Available to test in the demo today. Production hardening, UX/debuggability, and performance tuning are in progress.
- A runnable POC showing end-to-end ingestion + query paths across OLTP and analytics.
- A demonstration of global strict serializability for multi-key and multi-table transactions under the constraints described below.
- A demonstration of analytics reads that can be executed against persisted structures (including snapshots) to reduce OLTP interference.
- A claim of full feature parity with other mature, general-purpose SQL databases.
- A promise that every enterprise workload can be consolidated immediately without trade-offs.
- A turnkey “drop-in replacement” for warehouses/lakes in every scenario (some organizations will still choose to export to Parquet/Iceberg for cost/performance/lifecycle reasons).
Strict serializability is provided by Accord (CEP-15), included in this stack. The intention is:
- strict-serializable isolation
- low latency in the common case (single wide-area round trip under normal conditions)
- leaderless operation without introducing a global bottleneck
This system is quorum-based for transactional decisions:
- If a quorum can be reached, the system remains available for those operations.
- If a quorum cannot be reached due to failures or partitions, the system preserves correctness by blocking progress rather than returning inconsistent results.
A practical rule of thumb:
- With replication factor (RF) = 3, you can typically lose 1 replica per key-range and still make progress.
- With higher RF (and appropriate placement), you can tolerate more failures, but “how many can be down” depends on the quorum configuration and failure domain (racks / AZs / regions).
Accord is designed so that transactions do not need to “roll back” in the traditional sense of “speculatively applied state that must be undone.”
Instead, transactions are journaled/ordered and may be blocked (or forced onto a slower path) when conflicts or failures require coordination. This can trade tail latency for correctness and availability under failure, and creates an simpler application development experience.
The goal is to avoid a permanent architectural requirement to:
- copy OLTP data into a second system (warehouse/lake)
- maintain continuous ETL just to make analytics possible
Instead, the stack supports analytics that can read:
- directly from persisted on-disk structures (SSTable-oriented bulk read)
- from coordinated cluster snapshots of those structures
This is why the demo emphasizes:
- Spark bulk reader/writer paths that can interact with persisted structures via Sidecar endpoints
- snapshot-coordinated reads for consistent analytical views
In the demo, “snapshotting” refers to:
- coordinating snapshots across the cluster
- running analytic queries over the snapshotted persisted structures
This yields a stable analytic view while allowing OLTP to continue without disruption (i.e. resource isolation).
The demo’s design intent is:
- OLTP uses the normal request path
- analytics uses persisted-structure reads and/or snapshot reads
- a Sidecar can stream/offload snapshot files to separate storage to reduce repeated I/O contention
The intended outcome is:
- minimal impact to OLTP p99 latency under analytic load, provided the system is configured correctly
- Point reads / bounded partition reads: OLTP path
- Per-partition analytics: either OLTP path or persisted-structure reads depending on concurrency and SLAs
- Wide scans / large token-range reads: persisted-structure reads are typically preferred
Note: OLTP wide scans (token-range queries) will generally have higher p99 latency than partition reads. They can still be “fast,” but the analytics path is designed to be the better tool for that job.
The SQL interface in this stack is a Postgres wire-protocol + Postgres dialect adapter implemented as a prototype on top of the transaction layer using Apache Calcite.
- This is not a claim of full Postgres feature parity.
- The prototype’s GitHub page documents exactly what is implemented and what is not.
- Treat it as a pragmatic interoperability layer:
- helpful for onboarding, tooling compatibility, and incremental migration
- not a substitute for validating required SQL semantics for your application
This stack does not require Parquet/Iceberg to function as an HTAP foundation.
However, exporting to columnar formats can still be useful when you explicitly want:
- backups and long-term retention
- cold storage / tiering
- very scan-heavy workloads where columnar storage provides better cost/performance
In those cases:
- Parquet/Iceberg are performance and lifecycle optimizations
- the authoritative, freshest, strongly consistent view remains in the OLTP store
Both the underlying store and the transaction layer support tunable consistency (including per-operation tuning).
Use this intentionally:
- strict-serializable transactions for correctness-critical invariants
- weaker consistency where latency/availability tradeoffs are acceptable and correctness requirements permit
Below are the common enterprise concerns that unified OLTP+analytics stacks must address.
Enterprises will ask:
- Who operates this at 2am?
- What are the failure modes and runbooks?
- How do upgrades, repairs, and incident response work?
Recommendation:
- treat this repo as an evaluation harness
- define an operational model (SRE ownership, oncall, SLIs/SLOs, upgrade cadence)
Expect requirements for:
- centralized RBAC/ABAC
- auditing and lineage
- data masking / tokenization policies
- separation of duties and tenant isolation
Recommendation:
- define the governance plane early, including analytic access patterns
- ensure consistent policy enforcement across OLTP + analytics interfaces
Enterprises need:
- safe schema change workflows
- backfills and reprocessing strategies
- compatibility guarantees across versions
Recommendation:
- document the migration path for legacy SQL workloads (what works today, what is planned)
Enterprises typically require:
- reliable CDC to Kafka
- dedup semantics aligned with replication and failure handling
- a clean operational experience for CDC pipelines
Note:
- The Sidecar is intended to provide CDC-to-Kafka with replication-factor-aware dedup semantics.
- This is expected to be added to the demo.
Teams will ask:
- What is the concurrency model for analytics?
- What are the limits for joins, aggregations, and scans?
- Which tools work out of the box?
Recommendation:
- set expectations clearly: “supported query shapes” + “best path per shape”
- provide example workloads and reproducible benchmarks
Enterprises will require:
- clearly documented RPO/RTO
- failover/failback procedures
- proof that invariants hold during regional impairment
Recommendation:
- include a DR drill guide and a failure-injection test plan
- define how “availability” is preserved while respecting correctness
No. This repository is a proof-of-concept you can run today to evaluate the approach. Production readiness requires additional hardening, operational tooling, and performance tuning.
Strict serializability requires coordination. Availability here means: as long as a quorum can be reached, transactions and reads that require strict guarantees can proceed. The leaderless design of the Accord protocol means availability is per request, meaning down server only impact the limited set of requests. Under partitions or failures that prevent quorum, the system preserves correctness by blocking or degrading to slower coordination paths rather than returning inconsistent results.
“No duplication” means you can run many analytics workloads directly over persisted structures and snapshots without requiring an always-on ETL copy. You may still export to Parquet/Iceberg for explicit goals like cold storage, backups, or cost/performance optimization for scan-heavy workloads. Those exports are optional optimizations, not required plumbing.
The main tradeoffs are:
- validating SQL feature coverage vs relying on “Postgres-compatible” branding
- choosing the correct query path for each workload shape (point reads vs scans)
- operational maturity (observability, runbooks, upgrades, DR drills)
- governance and CDC integration details (often the real enterprise blockers)
Yes. TODO
- Accord / CEP-15 (transactions, strict serializability, failure tolerance goals)
- CEP-28 (Spark bulk reader/writer via Sidecar to persisted storage)
- Cassandra Analytics (bulk reader/writer examples)
- SQL prototype repo (Postgres wire protocol + Calcite-based dialect coverage)
