- Alluxio - A virtual distributed storage system that bridges the gab between computation frameworks and storage systems.
- Apache Arrow - In-memory columnar representation of data compatible with Pandas, Hadoop-based systems, etc
- Apache Druid - A high performance real-time analytics database. https://druid.apache.org/. An introduction to Druid, your Interactive Analytics at (big) Scale.
- Apache Ignite - A memory-centric distributed database, caching, and processing platform for transactional, analytical, and streaming workloads delivering in-memory speeds at petabyte scale. TensorFlow on Apache Ignite, Distributed ML in Apache Ignite
- Apache Kafka - Distributed streaming platform framework
- Apache Parquet - On-disk columnar representation of data compatible with Pandas, Hadoop-based systems, etc
- Apache Pinot - A realtime distributed OLAP datastore https://pinot.apache.org. Comparison of the Open Source OLAP Systems for Big Data: ClickHouse, Druid, and Pinot.
- BayesDB - Database that allows for built-in non-parametric Bayesian model discovery and queryingi for data on a database-like interface - (Video)
- ClickHouse - ClickHouse is an open source column oriented database management system supported by Yandex - [(Video)](https://
- EdgeDB - NoSQL interface for Postgres that allows for object interaction to data stored
- HopsFS - HDFS-compatible file system with scale-out strongly consistent metadata.
- InfluxDB Scalable datastore for metrics, events, and real-time analytics.
- TimescaleDB An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension. Time-series ML in TimescaleDB www.youtube.com/watch?v=zbjub8BQPyE)