An implementation of the CESR encoding/decoding protocol of KERI protocol/genus 1.0 and 2.0 tables.
The CESR spec defines encoding and decoding the following types of values to two encoding domains:
Field maps (of field map types from MGPK, CBOR, JSON) Count Codes (groupings of CESR primitives or field maps or full CESR streams) Primitives (both regular primitives and indexed primitives)
It also has support for a weird list of "annotations" tags followed by a string of arbitrary length until a newline.
The two encoding domains are binary and urlsafe b64. The spec also refers to a "raw" domain but this is just the name the author gives to values instantiated at runtime from the encoding domain (ie the values you work with as a developer).
See: https://trustoverip.github.io/kswg-cesr-specification/#performant-resynchronization-with-unique-start-bits if you want the details.
All field maps must be ordered maps in the CESR spec! this presents something of a difficulty in Elixir/Erlang where all maps don't garuntee order for performance reasons. We have chosen to use the ord_map library from hex.pm which is fine although may have problems scaling in the future.
Similarly, if you are ever going to encode your Ordmap the "kind" of serialization you'll do is controlled by the Version String as these version strings are specified in the spec and are required for the field maps to fit into the TLV scheme. These kinds are :cbor, :json, :mgpk respectively.
So unless you know what you're doing all KERI/ACDC/TSP other field maps as may rely on CESR are instantiated as an ordmap with a version string. The size will be overwritten at encoding time with the proper version string size and can only be relied on when decoding.
ie)
iex>{:ok, v} = Cesr.Version_String_1.new(%{proto: :keri, major: 1, minor: 0, kind: :json, size: 0})
iex>my_ord = OrdMap.new([{"v", v}])Count codes are instantiated with lists of cesr_elements. We have tried to add validation and consumption functions where we can but this is a work in progress and these validations and consumptions are just guesses or painful reverse engineering we've done with keripy. The spec author considers these semantics out of scope for some reason. So that being said, if something is funny, there may be a bug or feature there to fix.
Failures should return {:error, <err reason>}
{:ok, my_yes} = Primitive.CD_1AAM_Yes.new("")
{:ok, fake_blake3} = Primitive.CD_E_Blake3_256.new(<<0::size(256)>>) # 256bit payload
{:ok, cnt_cd} = CountCodeKERIv1.CD_T_GenericGroup.new([my_yes, fake_blake3])
{:ok,
%Cesr.CountCodeKERIv1.CD_T_GenericGroup{
cesr_elements: [
%Cesr.Primitive.CD_1AAM_Yes{code: "1AAM", payload: ""},
%Cesr.Primitive.CD_E_Blake3_256{
code: "E",
payload: <<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>>
}
]
}}This library has a set of primitives that correspond to all cesr elements that currently exist. At runtime these elements are all elixir structs. To make a well-formed cesr stream we take some of those elements in a list, nested according to their semantics as appropriate, and then encode to one of the two domains listed above.
iex>my_list_of_lists = [%CD_T_GenericGroup{cesr_elements: [%CD_1AAM_Yes{code: "1AAM", payload: ""}]}]
iex>Cesr.produce_binary_stream(my_list_of_lists)
<<249, 48, 1, 212, 0, 12>>
iex>Cesr.produce_text_stream([%CD_T_GenericGroup{cesr_elements: %CD_1AAM_Yes{cesr_elements: []}}])
"-TAB1AAM"We can also take ordered maps with either a Version_String_1 for CESR 1.0
field maps or Version_String_2 for CESR 2.0 field maps as the first element
v field and serialize them to their field map type's representation (in this case json).
iex>{:ok, v} = Cesr.Version_String_1.new(%{proto: :keri, major: 1, minor: 0, kind: :json, size: 0})
iex>my_ord = OrdMap.new([{"v", v}])
iex>Cesr.produce_text_stream([my_ord])
"{\"v\":\"KERI10JSON000019_\"}"Note: The size of field maps is calculated automatically per the spec.
Similarly we can put both on one stream but the consumer would have to know how to consume all the pieces if they have some significance to each other.
iex>Cesr.produce_text_stream([my_ord, my_list_of_lists])
"{\"v\":\"KERI10JSON000019_\"}-TAB1AAM"Decoding is the inverse of the above process.
iex>Cesr.consume_stream("{\"v\":\"KERI10JSON000019_\"}-TAB1AAM")
[%OrdMap{tuples: [{"v", %Version_String_1{proto: :keri, major: 1, minor: 0, kind: :json, size: 25}}]}, %CD_T_GenericGroup{cesr_elements: [%CD_1AAM_Yes{code: "1AAM", payload: ""}]}]Note: once again we've deserialized the size but hexidecimal 19 == decimal 25 in case you noticed the discrepency.
By default we choose CESR1.0 (not specified in official spec) and the keri protocol/genus table 1.0 at some point this will change but if you'd like to parse protocol/genus 2.0 pass :keri_aaacaa into the stream when decoding.
iex>Cesr.consume_stream("{\"v\":\"KERI10JSON000019_\"}-JAB1AAM", :keri_aaacaa)
[%OrdMap{tuples: [{"v", %Version_String_1{proto: :keri, major: 1, minor: 0, kind: :json, size: 25}}]},
%CD_J_GenericListGroup{cesr_elements: [%CD_1AAM_Yes{code: "1AAM", payload: ""}]}]There's also a confusing pattern in KERI and ACDC specs (as well as other
places) where CESR streams that aren't well-formed (don't abide by cold-start
requirements) can be embedded in field maps. These are typically primitives as
far as we're aware but honestly could be count codes with our implementation.
They're just a table lookup into the protocol genus of the surrounding field
map. You can deserialize these elements with
consume_primitive_T(<some_b64_cesr_element>).
iex>{:ok, zero_blake3_digest} = Kerilixir.Cesr.Primitive.CD_E_Blake3_256.new(<<0::size(256)>>)
iex>Kerilixir.Cesr.CesrElement.to_b64(zero_blake3_digest)
"EAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
iex>Cesr.consume_primitive_T("EAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA")
{:ok,
%Cesr.Primitive.CD_E_Blake3_256{
code: "E",
payload: <<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>>
}}Note: we use a fake blake32 digest but these digests are oftentimes used as SAIDs in KERI and ACDC so you'll see them a lot.
Real world CESR 1.0 streams from GLEIF are available in the examples/gleif_cesr_streams directory. Excepting the *-vc.cesr examples all are CESR streams similar to those used by GLEIF today.
They were taken from the wonderful https://github.com/psteniusubi/cesr-decoder repo that we learned so much from and was the inspiration for many of the tools in this repo.
As mentioned the ordmap representation uses Elixir lists (regular linked lists) to store its tuples to keep the order. While this is fine and honestly the BEAM is probably the best VM in the world for optimizing typical use cases we may outgrow this library at scale. A real ordered map using tries (maybe just copied from the optimized Python implmentation) is probably going to be needed at some point.
Due to the fact that Elixir doesn't have ordered maps it was hard to find a CBOR implementation that supported an ordered map. We eventually had to hack on cbor from hex ourselves and publish our fork with ordmaps to support cbor. At some point this work will also have to be reconciled.
We use binaries everywhere for simplicity. There's probably a lot of copying but its blazing fast and honestly we couldn't break it in the benchmarks we put it through (probably because its just a TLV scheme and the BEAM is optimized for that). That being said, iodata optimizations or just algorithmic optimizations are probably possible. We still feel like one more pass might simplify a lot of this implementation.
Similarly, a stream should probably be an elixir stream and evaluated in place (to lower the memory usage of passing all those binaries around). This one is probably a good first issue for someone with some free time.
Honestly, the KERI protocol/genus table v1 is whats in production at GLEIF and v2 is due to be in production ??? so you probably won't need more code tables than this. However, we didn't quite have the metacoding skills in Elixir necessary to generalize the macros more than we did. It would be cool if you could bring your own table defined in the CESR scheme and be able to generate code table on the fly and stuff. This is probably a lot of free time for someone.
See CONTRIBUTING
See CONTRIBUTING
Any bugs with the implementation please raise in this repo.
Any bugs with the specification you can raise here
Honestly if you think you can make this repo, the docs, or the implementation better in some way please let us know. We don't know if we can accept everything but honestly every little bit helps.