Simple schema evolution with maximum performance
VBARE is a tiny extension to BARE that provides a way of handling schema evolution.
BARE is a simple binary representation for structured application data.
Messages are encoded in binary and compact in size. Messages do not contain schema information — they are not self-describing.
BARE is optimized for small messages. It is not optimized for encoding large amounts of data in a single message, or efficiently reading a message with fields of a fixed size. However, all types are aligned to 8 bits, which does exchange some space for simplicity.
BARE's approach to extensibility is conservative: messages encoded today will be decodable tomorrow, and vice-versa. But extensibility is still possible; implementations can choose to decode user-defined types at a higher level and map them onto arbitrary data types.
The specification is likewise conservative. Simple implementations of message decoders and encoders can be written inside of an afternoon.
An optional DSL is provided to document message schemas and provide a source for code generation. However, if you prefer, you may also define your schema using the type system already available in your programming language.
Also see the IETF specification.
Goals:
- Fast — Self-contained binary encoding, similar to a tuple structure
- Simple — Can be reimplemented in under an hour
- Portable — Cross-language support with well-defined standardization
Non-goals:
- Data compactness — That's what gzip is for
- RPC layer — This is trivial to implement yourself based on your specific requirements
- Network protocols — Allow the server to cleanly evolve the protocol version without breaking old clients
- Data at rest — Upgrade your file format without breaking old files
VBARE works by declaring a schema file for every possible version of the schema, then writing conversion functions between each version of the schema.
Versions
Each message has an associated version, an unsigned 16 bit integer. Each version of the protocol increases monotonically from 1 (e.g. v1
, v2
, v3
). This version is specified in the file name (e.g. my-schema/v1.bare
).
Schema Evolution & Converters
On your server, you manually define code that will convert between versions for both directions (upgrade for deserialization, downgrade for serialization).
There are no evolution semantics in the schema itself — simply copy and paste from v1
to v2
the schema to write a new version.
Servers vs Clients
Servers need to include converters between all versions.
Clients only need to inlucde a single version of the schema since the server is responsible for version conversion no matter what version you connect with.
Embedded vs Pre-Negotiated Versions
Every message has a version associated with it. This version is either:
- Embedded in the message itself in the first 2 bytes of the message (see below)
- Pre-negotiated via mechanisms like HTTP request query parameters or handshakes
- For example, you can extract the version from a request like
POST /v3/users
asv3
- For example, you can extract the version from a request like
Embedded Binary Format
Embedded version works by inserting an unsigned 16 bit integer at the beginning of the buffer. This integer is used to define which version of the schema is being used.
The layout looks like this:
+-------------------+-------------------+
| Schema Version | BARE Payload |
| (uint16, 2B) | (variable N B) |
+-------------------+-------------------+
The core of why VBARE was designed this way is:
- Manual evolutions simplifies application logic by putting all complex evolutions & defaults in a conversion code instead of inside your core application logic
- Manual evolution forces developers to handle edge cases of migrations & breaking changes at the cost of more verbose migration code
- Stop making big breaking v1 to v2 changes — instead, make much smaller schema changes with more flexibility
- Schema evolution frequently requires more than just renaming properties (like Protobuf, Flatbuffers, Cap'n'proto) — more complicated reshaping & fetching data from remote sources is commonly needed
(Full list of BARE implementations)
Adding an implementation takes less than an hour — it's really that simple.
- Rivet Engine
- Data at rest
- Internal network protocols (tunnel, Epoxy, UPS)
- Public network protocol (runner)
- RivetKit
Why is copying the entire schema for every version better than using decorators for gradual migrations?
- Decorators are limited and become very complicated over time
- It's unclear at what version of the protocol a decorator takes effect — explicit versions help clarify this
- Generated SDKs become more and more bloated with every change
- You need a validation build step for your validators
- Manual migrations provide more flexibility for complex transformations
RPC interfaces are trivial to implement yourself. Libraries that provide RPC interfaces tend to add extra bloat and cognitive load through things like abstracting transports, compatibility with the language's async runtime, and complex codegen to implement handlers.
Usually, you just want a ToServer
and ToClient
union that looks like this:
Yes, but after enough pain and suffering from running production APIs, this is what you will end up doing manually anyway — but in a much more painful way.
Having schema versions also makes it much easier to reason about how clients are connecting to your system and the state of an application.
Migration steps are fairly minimal to write. The most verbose migration steps will be for deeply nested structures that changed, but even that is relatively straightforward.
- More verbose migration code — but this is usually because VBARE forces you to handle all edge cases
- The older the version, the more migration steps need that need to run to bring it to the latest version — though migration steps are usually negligible in cost
- Migration steps are not portable across languages, but only the server needs the migration steps, so this is usually only implemented once
MIT