Skip to content

PostgreSQL extension for binary serialization: MessagePack, CBOR, ZERA, FlexBuffers. 21% smaller than JSON. Clean C++20 template architecture.

License

Notifications You must be signed in to change notification settings

mrayva/pg_zerialize

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pg_zerialize

License: MIT PostgreSQL C++20

PostgreSQL extension for converting rows to efficient binary formats using the zerialize library.

21% smaller than JSON with MessagePack/CBOR formats.

Status: All Formats Complete! 🎉

FlexBuffers - Fully implemented ✅ MessagePack - Fully implemented ✅ CBOR - Fully implemented ✅ ZERA - Fully implemented

Requirements

  • PostgreSQL 12+ (with development headers)
  • C++20 compatible compiler (GCC 10+, Clang 10+)
  • FlatBuffers library (libflatbuffers-dev)
  • MessagePack C library (libmsgpack-c-dev)
  • jsoncons library (libjsoncons-dev)
  • zerialize library (header-only, included in vendor/)

Building

make
sudo make install

Usage

Single Record Serialization

CREATE EXTENSION pg_zerialize;

-- Convert a single row to any of the four binary formats
SELECT row_to_msgpack(ROW('John', 25, true));
SELECT row_to_cbor(ROW('John', 25, true));
SELECT row_to_zera(ROW('John', 25, true));
SELECT row_to_flexbuffers(ROW('John', 25, true));

-- Serialize table rows individually
SELECT row_to_msgpack(users.*) FROM users;

Batch Processing (Faster for Multiple Rows)

-- Serialize multiple rows in a single call (2-3x faster!)
SELECT rows_to_msgpack(array_agg(users.*)) FROM users;
SELECT rows_to_cbor(array_agg(users.*)) FROM users;
SELECT rows_to_zera(array_agg(users.*)) FROM users;
SELECT rows_to_flexbuffers(array_agg(users.*)) FROM users;

-- Compare sizes across all formats
SELECT
    octet_length(rows_to_msgpack(array_agg(users.*))) as msgpack_bytes,
    octet_length(rows_to_cbor(array_agg(users.*))) as cbor_bytes,
    octet_length(rows_to_zera(array_agg(users.*))) as zera_bytes,
    octet_length(rows_to_flexbuffers(array_agg(users.*))) as flexbuffers_bytes
FROM users;

Performance

Based on real-world testing with user records (5 rows average):

Format Avg Size vs JSON Best For
MessagePack 71 bytes -21% 🥇 Max compression, APIs, caching
CBOR 71 bytes -21% 🥈 IoT, IETF standard (RFC 8949)
JSON 90 bytes baseline Human-readable, debugging
FlexBuffers 142 bytes +58% Zero-copy reads, lazy access
ZERA 209 bytes +132% Zerialize ecosystem, advanced features

MessagePack and CBOR are the most compact, both saving ~21% vs JSON. FlexBuffers trades size for zero-copy deserialization capability. ZERA includes additional structure for advanced features but is larger.

Performance Optimizations

All major performance optimizations complete! Combined speedup: ~3-5x faster than original!

Schema Caching - TupleDesc lookups cached, 20-30% faster bulk operations

Batch Processing - Multiple rows in single call, 2-3x faster for bulk operations

Buffer Pre-allocation - Map/array capacity reserved upfront, 5-10% faster with reduced memory fragmentation

Next Steps

  1. Implement FlexBuffers support
  2. Implement MessagePack support
  3. Implement CBOR support
  4. Implement ZERA support
  5. Add array support for PostgreSQL arrays
  6. Add proper NUMERIC/DECIMAL handling
  7. Schema caching optimization
  8. Batch processing for multiple rows
  9. Buffer pre-allocation optimization
  10. Add nested composite type support
  11. Add date/timestamp types
  12. Add deserialization functions

About

PostgreSQL extension for binary serialization: MessagePack, CBOR, ZERA, FlexBuffers. 21% smaller than JSON. Clean C++20 template architecture.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published