RFC0006: RSF version 1 by arnau · Pull Request #12 · openregister/registers-rfcs

arnau · 2018-04-05T09:55:18Z

Context

RSF has been documented in multiple places but never in a consistent manner. This is an attempt to describe the current state of things and serve as the starting point to create the first RSF specification.

Note that RSF has no versioning system. This is a good opportunity to discuss if we should add one to allow non backwards-compatible changes without breaking clients.

Guidance to review

Review the current implementation of RSF is accurately described in this RFC and that the explanation is understandable and unequivocal.

arnau · 2018-04-05T09:55:28Z

/cc @gidsg

nacnudus · 2018-04-20T10:14:03Z

content/rsf-spec/index.md

+is that diverges from the rest of the API where serialisation is expressed
+either via suffix or via media type. The problem with using the same approach,
+say `GET /register.rsf` is that we are not providing the same information when
+querying `GET /register.json`.


Does this mean that the endpoint GET /download-rsf doesn't provide the same information as GET /register.rsf, or that the user has to provide different information?

The current implementation uses `GET /download-rsf`. The main issue with that is that diverges from the rest of the API where serialisation is expressed either via suffix or via media type. The problem with using the same approach, say `GET /register.rsf` is that we are not providing the same information when querying `GET /register.json`.

Well, /register.json right now offers some sort of summary-metadata of the register. RSF by nature is the full register. So both things are at odds.

I didn't know about that endpoint 😬. It goes without saying that I think it should have a different name, e.g. /summary.json.

At least it's clearer in intention 👍

nacnudus · 2018-04-20T10:15:57Z

content/rsf-spec/index.md

+say `GET /register.rsf` is that we are not providing the same information when
+querying `GET /register.json`.
+
+What is a good name for a resource that represents the whole raw database?


I think "register" is a good name for a register ;)

GET /register.rsf

There is another endpoint called /download-register that downloads a ZIP file containing "the whole database". We should also consider whether we keep that an/or rename it in line with these thoughts.

Also note there are other RSF endpoints:

/download-rsf/n gets RSF for the register after entry-number n, i.e. it returns the whole register from entry number n+1 to the end of the register.

/download-rsf/n/m gets RSF for entries n+1 to m'

Added the description in a new commit.

I'm thinking that the /download-register endpoint should be renamed to /archive. With that change, I think having /archive.rsf or better /archive -H 'Accept: application/vnd.rsf would help consolidate resources that are conceptually the same.

To accommodate the filtering that happens in /download-rsf/n we could do something on the lines of /archive.rsf?from=n&to=m.

Thoughts?

/cc @MatMoore

To be clear, /archive (zip file) would be normative, rsf a non-normative extension.

That sounds like a good idea - I like the name archive better than register for this purpose, and it makes sense for the zip & rsf to be different representations of the same thing.

nacnudus · 2018-04-20T10:19:30Z

content/rsf-spec/index.md

+
+### Commands
+
+#### <a id="assert-root-hash-command">`assert-root-hash` command</a>


Why are there two assert-root-hash commands per register? How do you use them?

assert-root-hash can appear as many times as you want in a RSF file. The idea is that every time you find one, you can use it to verify you have got all the previous entries right, it's a checkpoint really.

Thanks, that answers my later question about whether an RSF file starts at the beginning of a register. I should have checked with echo -n | sha256sum. In that case should this spec require that the the first line of an RSF file be assert-root-hash?

I suggested that in another comment, the potential problem is that currently it is not a requirement so some RSF might be broken if we impose this restriction. Let me assess how big of a deal it is really.

Since the scope of this review is "Review the current implementation of RSF is accurately described in this RFC", I'll back off and open this in an issue instead.

nacnudus · 2018-04-20T10:21:33Z

content/rsf-spec/index.md

+#### <a id="add-item-command">`add-item` command</a>
+
+Adds a new [Item resource][item-res] to the register. It will require an
+[`append-entry` command](#append-entry-command) to make it visible to users.


Instead of "make it visible to users", I would put "associate the item with a record key".

umm, I like what you are suggesting but I don't like the idea of using the word "record". @michaelabenyohai ideas?

How about "There must be a corresponding append-entry command that refers to the item's hash"? Noting the later comment on "It is illegal to have orphan items", which insists on the append-entry for the file to be valid.

That could work 👍

nacnudus · 2018-04-20T10:26:16Z

content/rsf-spec/index.md

+2. The `key` of the entry. The primary key field is the field with the same
+   name as the register.
+3. The `timestamp` of the entry. This is the time at which the entry was
+   appended to the register.


I believe this is the first time we have defined what the timestamp means. Are we sure?

I think it's the first time we have something that clear yes. Based on usage I'd say yes, the timestamp is the consequence of minting an item so it's the recording time for the entry. It mimics git's behaviour.

It makes me nervous because that's only the way we've used it. I haven't thought about the consequences of timestamps being out of sequence. @michaelabenyohai am I being paranoid?

what would be the problem of having timestamps out of sequence? The order of the log is dictated by the entry number.

It could mess with tooling using the timestamp to infer something related to time outside "time of recording" but again, nothing you wouldn't see in git or similar.

I think it's up to the tooling to be zealous about timestamps to the extent a tool can be.

Okay I'm satisfied 🙂

For reference, RFC0003 covers this topic (#16 )

nacnudus · 2018-04-20T10:29:02Z

content/rsf-spec/index.md

+* [Commands](#commands) are executed in order of appearance, top to bottom.
+* Entries are numbered in sequence in order of appearance starting with 1 if
+  the register is empty, otherwise incrementing on the latest entry number
+  found in the register.


Is there any way to tell whether an RSF file represents a complete register, or only appends to a register?

The only way as far as I know is via assert-root-hash. In a situation where the RSF has no assertions you wouldn't know.

One possibility that would require evolving the implementation would be requiring an assert-root-hash as the first command of any RSF file.

In that case I think it would be useful to require an assert-root-hash in the first line of every file, for the sake of reassembling a register from its patches.

Should we specify that this currently does not include system entries (though it should do).

By this I mean that entry numbers are computed independently for user and system currently (as they are separate logs on the inside).

Yes, it needs to explain the current implementation exactly.

nacnudus · 2018-04-20T10:32:44Z

content/rsf-spec/index.md

+  the [`add-item` command](#add-item-command) that introduces the item is
+  referencing.
+* It is illegal to have orphan items. An `add-item` must have at least one
+  `append-entry` referencing to the item.


If the the validity of an add-item depends on a subsequent append-entry, then a file that gets accidentally truncated in transit will be invalid. This seems unnecessarily strict, but I haven't thought hard about the disadvantages of orphan items. Is it to do with treating patches as transactions?

Yes, the original discussion about tradeoffs of allowing orphan items or not made the team decide to be transactional per file so if something wrong happens in the middle of consuming a delta, it must be rolled back so you can retry again with the same delta without complex gymnastics. There were other considerations around how to handle orphans in the API and the garbage collection of them if allowed.

I think it's a reasonable restriction given the consequences of not having it.

Does it matter whether the add-item command comes before or after the append-entry one?

If my memory serves me well, it has to do with exposing the entry to the API users and not having the item available yet (the record couldn't be computed either). This is me reading between lines and guessing a bit here. In all fairness if we think it's ok to expose an entry as soon as it gets parsed, it means we are not behaving transactionally (a rollback could cause a disruption in someone that already consumed an item). I'll try to pull out the rationale for this restriction but, if it's not satisfactory I might reconsider having the restriction in place.

nacnudus · 2018-04-20T10:33:38Z

content/rsf-spec/index.md

+#### Type checking
+
+Although not part of the RSF specification, it is worth mentioning that a
+Registers' implementation is expected to type check the data according to the


I don't think the apostrophe is needed on "Registers'"

nacnudus · 2018-04-20T10:38:19Z

content/rsf-spec/index.md

+to the state rolled back.
+
+
+### Examples


Should these examples be valid patches?

They should yes, if I haven't mess it up, they are 😱

Haha, I don't think you've messed them up. But I think the spec should require an assert-root-hash in the first and final lines.

Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

michaelabenyohai · 2018-05-03T10:40:54Z

content/rsf-spec/index.md

+
+append-entry     = %s"append-entry" HTAB type HTAB key HTAB timestamp HTAB hash-list
+type             = "user" / "system"
+key              = alphanum / %x2D / %x5F


I'm not sure I follow what alphanum / %x2D / %x5F means. Can you explain?

alphanum is defined a few lines below as alphanum = ALPHA / DIGIT, so the key definition is: "any ASCII character (A-Z, a-z) or a hypen (%x2D, -) or an underscore (%x5F, _)".

alpha and digit are defined in https://tools.ietf.org/html/rfc5234#appendix-B.1
%x2D and %x5F are the hexadecimal codes for hypen and underscore respectively and / defines an OR.

Ah I missed the definition of alphanum so that makes sense now.

Does this imply there is only one character in a key? i.e it's either a single ASCII character, hyphen or underscore. It doesn't appear to say that a key can be made up of multiple of these things but I might be reading it wrong still.

HA, very good point. I have to add multiple chars :)

I'm wondering if the first char has special treatment in the current implementation? (e.g. can a key start with a hypen: -foo? If not, this needs further restriction

michaelabenyohai · 2018-05-03T10:42:11Z

content/rsf-spec/index.md

+log              = command *(CRLF command) [CRLF]
+command          = add-item / append-entry / assert-root-hash
+
+assert-root-hash = %s"assert-root-hash" HTAB hash


Does %s mean case sensitive? If so, why do we only use it for "add-item" etc and not for things like "user"? Not that I think we've ever specified whether these things are case sensitive or not.

Good point. Yes %s means case sensitive and all tokens that should be strictly in lower case should be prepended with that. I'll amend them 👍

Fixed in a new commit.

michaelabenyohai · 2018-05-03T10:58:05Z

content/rsf-spec/index.md

+
+Asserts that the provided root hash is the same as the one computed from the
+current entry log as defined in the [Digital Proofs][digital-proofs]
+specification.


Should we specify that although the included in RSF, an assert-root-hash command currently ignores all system entries? It only includes user entries.

Yes, absolutely. I will amend this too 👍

michaelabenyohai · 2018-05-03T11:00:05Z

content/rsf-spec/index.md

+##### Arguments
+
+1. The `type` of the entry determines if the entry belongs to the data log
+   (`user`) or to the metadata log (`system`).


This implies that the data log and the metadata log are separate things. Is this intentional? I know they are kind of separate currently (e.g. system entries are ignored in root-hashes) but they do all appear in the same "log" in the RSF. I guess this kind of does correctly explain how things are now (even if we want to change them).

I think this RFC should document how things are right now. When we change it, we will have another RFC that explains the change and can refer to the original RFC as its starting point.

The logs are intertwined because A data item must conform to the current schema derived from the previous system entries, so even though system entries aren't in the root hashes, the RSF is still invalid if they are reordered.

You are right, there is a level of checking that guarantees consistency cross log 👍

michaelabenyohai · 2018-05-03T11:18:05Z

content/rsf-spec/index.md

+* It is illegal to have broken references. An `append-entry` must reference an
+  item previously introduced by an `add-item` command.
+* The item in the `add-item` command must always be in the canonical form.
+


Should we mention anything about identical consecutive append-entry commands currently being illegal?

i.e. this is illegal

append-entry user GB 2010-11-12T13:14:15Z sha-256:08bef0039a4f0fb52f3a5ce4b97d7927bf159bc254b8881c45d95945617237f6 append-entry user GB 2010-11-12T13:14:50Z sha-256:08bef0039a4f0fb52f3a5ce4b97d7927bf159bc254b8881c45d95945617237f6

This is ok

append-entry user GB 2010-11-12T13:14:15Z sha-256:08bef0039a4f0fb52f3a5ce4b97d7927bf159bc254b8881c45d95945617237f6 append-entry user GB 2010-11-12T13:14:30Z sha-256:490636974f8087e4518d222eba08851dd3e2b85095f2b1427ff6ecd3fa482435 append-entry user GB 2010-11-12T13:14:50Z sha-256:08bef0039a4f0fb52f3a5ce4b97d7927bf159bc254b8881c45d95945617237f6

With the recent changes, is this still the situation? I'll need clarification on this.

This is currently how it is implemented, yes. ORJ won't let you load RSF that looks like the first example. However, government-service register will currently produce illegal RSF, because we accidentally allowed these duplicate entries into the register before coding the rule.

The rule is good hygiene but should it be enforced in the spec?

The goal of the spec is to allow different implementations to be built independently and be compatible with each other. If we don't enforce the same rules, a valid RSF in one tool could be invalid in another one. To me, this is a spec failure.

We can discuss if it should be a rule that exits parsing (and thus rolls back the entire patch), it should be ignored entirely (skipped) or raise a warning but keep parsing.

Given that the current implementation fails, the initial spec should reflect that exact behaviour.

michaelabenyohai · 2018-05-03T12:06:52Z

content/rsf-spec/index.md

+* It is illegal to have orphan items. An `add-item` must have at least one
+  `append-entry` referencing to the item.
+* It is illegal to have broken references. An `append-entry` must reference an
+  item previously introduced by an `add-item` command.


In the current implementation, if a corresponding add-item does not appear in the RSF file then it checks to see if the item already exists in the register. If it does, then the RSF file is deemed valid, otherwise it's rejected. This means that an RSF file that has an "orphan entry" could actually be valid if it is a patch on an existing register.

The inverse is not true for add-item. If an add-item is in an RSF file, the corresponding append-entry must appear in the same RSF file.

Ah! I misunderstood this behaviour. I'll amend 👍

Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

Explains that the current behaviour uses only data entries to generate the root hash which leaves metadata entries unasserted. Also makes explicit that the numbering for data and metadata logs are independent. Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

Ensures keys have one or more characters where the first is alphanumeric. Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

MatMoore

This makes sense to me, and I think it's good to specify the grammar and semantics of RSF, but I'd rather the registers spec doesn't require RSF as way to mint or download data. Can we leave out the REST API section entirely, and address it as an ADR in openregisters-java?

It feels like there are already a lot of different ways of downloading data from a register, and I'm wary about adding any more endpoints that do similar things to existing ones, because it makes the API harder to use. At the moment RSF is an internal thing and API users shouldn't need to understand it.

I like @michaelabenyohai's idea of combining with the download-register endpoint as the RSF could just be requested through content negotiation.

MatMoore · 2018-08-06T13:03:31Z

content/rsf-spec/index.md

+
+append-entry     = %s"append-entry" HTAB type HTAB key HTAB timestamp HTAB hash-list
+type             = %s"user" / %s"system"
+key              = alphanum *(alphanum / %x2D / %x5F)


I think this depends on #22
Also: what about forward slashes?

It does indeed. I need to amend that once #22 is accepted.

MatMoore · 2018-08-06T13:05:10Z

content/rsf-spec/index.md

+append-entry     = %s"append-entry" HTAB type HTAB key HTAB timestamp HTAB hash-list
+type             = %s"user" / %s"system"
+key              = alphanum *(alphanum / %x2D / %x5F)
+hash-list        = hash *(list-separator hash)


In the RSF I've looked at, this has just been a single item hash. When would this be a list?

The situation is theoretical and it is when you have an index. It is experimental and in review to assess if benefits outweigh (perceived) complexity.

MatMoore · 2018-08-06T13:09:59Z

content/rsf-spec/index.md

+
+### Media type
+
+The current media type is `application/uk-gov-rsf`. It should change to


As this is specific to ORJ, can we say that the media type of RSF is application/vnd.rsf but application/uk-gov-rsf may also be used for legacy reasons? Then we can fix that at any time without raising another RFC.

Sounds reasonable 👍

arnau · 2018-08-10T08:23:48Z

@MatMoore although I may agree RSF is not normative, I'd like to have it in the spec as an example of how the reference implementation has approached register serialisation.

arnau requested review from michaelabenyohai, nacnudus and subatoi April 5, 2018 09:55

arnau added rfc wip labels Apr 5, 2018

nacnudus reviewed Apr 20, 2018

View reviewed changes

Arnau Siches added 9 commits April 26, 2018 09:14

Translate from multiple docs

4291c2a

Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

Remove alternative media type

9b73b73

Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

Add type checking section

9c5b85a

Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

Add multi-item example

d445451

Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

Note on REST API

be59811

Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

Reword root hash argument

8b211c3

Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

Add thoughts on a new resource that is what RSF represents

11b0c18

Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

Fix typo

1816e46

Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

Reword add-item command as suggested by @nacnudus

3ed0474

Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

arnau force-pushed the rsf-spec branch from 8f4530d to 3ed0474 Compare April 26, 2018 08:15

michaelabenyohai reviewed May 3, 2018

View reviewed changes

Arnau Siches added 6 commits May 3, 2018 16:38

Fix ABNF case sensitive tokens

08fb49e

Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

Add download-rsf range endpoints

36b9540

Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

Fix ABNF for entry keys

f7e10ae

Ensures keys have one or more characters where the first is alphanumeric. Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

Add identical consecutive rule

4cad6f6

Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

Fix rule for orphan entries

3c97dd8

Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>

MatMoore reviewed Aug 6, 2018

View reviewed changes

arnau changed the title ~~RSF version 1~~ RFC0006: RSF version 1 Aug 9, 2018


		### Commands

		#### <a id="assert-root-hash-command">`assert-root-hash` command</a>


		### Media type

		The current media type is `application/uk-gov-rsf`. It should change to

Comments

Conversation

arnau commented Apr 5, 2018

Context

Guidance to review

Uh oh!

arnau commented Apr 5, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arnau Apr 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arnau Apr 20, 2018 •

edited

Loading