Skip to content

Latest commit

 

History

History
178 lines (122 loc) · 9.79 KB

README.md

File metadata and controls

178 lines (122 loc) · 9.79 KB

awesome-serialization Awesome

So many formats, so much wow. See wesm comment before making up your mind.

Contents

API

Serialization suitable for API RPC networked services.

  • CSV - Comma Separated Values. Textual.
  • JSON - Lightweight document data-interchange format. Textual.
  • JSONL - Schemeless "multiple JSON documents in 1 file" container data format. Textual.
  • JSON5 - JSON with added support for comments and relaxed syntax. Textual.
  • Thrift - Scalable code generation, schema evolution binary format. Binary.
  • Protocol Buffers - Google's data interchange format. Binary.
  • Message Pack - Efficient JSON-like binary serialization format. Binary.
  • bson - Binary schemeless JSON encoding. Binary.
  • XML - Extensible Markup Language. Genuinely Horrible. Textual.
  • Plist - Property List representation. Apple. Textual.
  • YAML - Indentation-based data serialization standard. Textual.
  • TOML - Tom's Obvious, Minimal Language. Textual.
  • CBOR - Concise Binary Object Representation. Schema-free. Binary.
  • ION - Amazon's advanced JSON-compatible serialization. Textual/Binary.

RPC

  • gRPC - A high-performance, open source universal RPC framework. Binary, ISO Layer 7.
  • RSocket - Application protocol providing Reactive Streams semantics. Binary, ISO Layer 5 (or 6).
  • Cap’n Proto - High-performance, schema-based data interchange format. Binary.
  • FlatBuffers - Suitable for zero-copy deserialization. Binary.

Big Data

Serialization suitable for big data at rest systems, from Hadoop family of solutions.

  • Parquet - Columnar storage for Hadoop workloads. Binary.
  • FlatBuffers - Protocol Buffers suitable for larger datasets. Binary.
  • ORC - Columnar storage for Hadoop workloads. Binary.
  • Avro - Scheme embedded, dynamic rich data structures. Textual/Binary.
  • Ion - Row storage with skip scan parsing. Structured, schema embedded. Amazon. Textual/Binary.
  • Arrow - Cross-language columnar data format optimized for analytics workloads. Binary.
  • Delta Lake - Transactional storage layer for big data workflows. Binary.
  • Iceberg - Open table format for large datasets. Binary.

Scientific

Large-scale sparse arrays used in physical, mathematics, and statistics research.

  • HDF5® - n-dimensional datasets, complex objects, with schema. Efficient I/O. Binary.
  • npy - Numpy arrays, cell sparse metadata. Binary.
  • NetCDF - Self-describing, machine-independent data format for scientific data. Binary.
  • Zarr - Scalable storage of n-dimensional arrays. Binary.
  • ASDF - Advanced Scientific Data Format for astronomy and beyond. Binary/Textual.

Machine Learning

Serialization of deep learning networks and weights.

  • SavedModel - TensorFlow package, weights, graph, executable code. Binary.
  • CoreML - Apple's on-device ML model format. Binary.
  • ONNX - Open Neural Network Exchange. Interoperability focused. Binary.
  • MLIR - Intermediate representation for machine learning computations. Textual/Binary.
  • TorchScript - Serialization for PyTorch models. Binary.
Deprecated in Machine Learning
  • GraphDef - TensorFlow graphs. Binary.
  • PMML - Predictive Model Markup Language for exchanging ML models.

Graph

Serialization formats for representing graph-oriented data structures.

  • json-ld - JSON for Linking Data. Textual.
  • Turtle - Terse RDF Triple Language. Textual.
  • GraphML - XML-based graph serialization format. Textual.
  • DOT - Graph description language, developed as a part of the Graphviz project. Textual.
  • ParquetGraph - Integration of Parquet with graph data structures. Binary.
  • GraphSON - JSON-based graph serialization. Textual.
Deprecated in Graph
  • GraphML - XML-based graph serialization format. Textual.- GML - Graph Modeling Language. Hierarchical ASCII-based. Textual.

Workflow

  • common-workflow-language - Specification for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments.
  • Apache Airflow DAGs - Python-based Directed Acyclic Graphs for workflows.
  • WDL - Workflow Description Language for genomics and scientific workflows.
  • Relational Algebra and Datalog for Graphs - Coursera course on graph data manipulation.
  • Cromwell - Scientific workflow management, compatible with WDL and CWL.
  • Nextflow - Scalable and reproducible scientific workflows.

Programming

Language-native serialization formats for in-transit data (aka memory-based "live" objects).

JavaScript

  • BSON.js - BSON serializer. Binary.
  • avsc - JavaScript implementation of Apache Avro. Textual.

Dart

Python

  • pickle - RAM to Disk serialization. Binary.
  • msgpack-python - MessagePack serializer implementation for Python.
  • srsly - Modern high-performance serialization utilities for Python.

Swift

Java

Rust

  • Serde - Rust's serialization framework for multiple formats like JSON, CBOR, and MessagePack.
  • bincode - High-performance binary serialization for Rust.

Go

  • GOB - Go's built-in serialization format for arbitrary data structures. Binary.

Streaming

Serialization formats optimized for real-time streaming data.

  • Kafka Streams - Real-time stream processing framework with built-in serialization.
  • Protobuf-Lite - Lightweight Protocol Buffers for constrained environments.

Security-Focused

Serialization formats designed with security and robustness in mind.

Academic

Research papers discussing types, category theory, benchmarks & co.

Contribute

Contributions welcome! Read the contribution guidelines first.

License

CC0

To the extent possible under law, Maxim Veksler has waived all copyright and related or neighboring rights to this work.