Conversation
|
/cc @gidsg |
| is that diverges from the rest of the API where serialisation is expressed | ||
| either via suffix or via media type. The problem with using the same approach, | ||
| say `GET /register.rsf` is that we are not providing the same information when | ||
| querying `GET /register.json`. |
There was a problem hiding this comment.
Does this mean that the endpoint GET /download-rsf doesn't provide the same information as GET /register.rsf, or that the user has to provide different information?
The current implementation uses `GET /download-rsf`. The main issue with that
is that diverges from the rest of the API where serialisation is expressed
either via suffix or via media type. The problem with using the same approach,
say `GET /register.rsf` is that we are not providing the same information when
querying `GET /register.json`.
There was a problem hiding this comment.
Well, /register.json right now offers some sort of summary-metadata of the register. RSF by nature is the full register. So both things are at odds.
There was a problem hiding this comment.
I didn't know about that endpoint 😬. It goes without saying that I think it should have a different name, e.g. /summary.json.
There was a problem hiding this comment.
At least it's clearer in intention 👍
| say `GET /register.rsf` is that we are not providing the same information when | ||
| querying `GET /register.json`. | ||
|
|
||
| What is a good name for a resource that represents the whole raw database? |
There was a problem hiding this comment.
I think "register" is a good name for a register ;)
GET /register.rsf
There was a problem hiding this comment.
There is another endpoint called /download-register that downloads a ZIP file containing "the whole database". We should also consider whether we keep that an/or rename it in line with these thoughts.
There was a problem hiding this comment.
Also note there are other RSF endpoints:
/download-rsf/ngets RSF for the register after entry-numbern, i.e. it returns the whole register from entry numbern+1to the end of the register./download-rsf/n/mgets RSF for entriesn+1tom'
There was a problem hiding this comment.
Added the description in a new commit.
There was a problem hiding this comment.
I'm thinking that the /download-register endpoint should be renamed to /archive. With that change, I think having /archive.rsf or better /archive -H 'Accept: application/vnd.rsf would help consolidate resources that are conceptually the same.
To accommodate the filtering that happens in /download-rsf/n we could do something on the lines of /archive.rsf?from=n&to=m.
Thoughts?
/cc @MatMoore
There was a problem hiding this comment.
To be clear, /archive (zip file) would be normative, rsf a non-normative extension.
There was a problem hiding this comment.
That sounds like a good idea - I like the name archive better than register for this purpose, and it makes sense for the zip & rsf to be different representations of the same thing.
|
|
||
| ### Commands | ||
|
|
||
| #### <a id="assert-root-hash-command">`assert-root-hash` command</a> |
There was a problem hiding this comment.
Why are there two assert-root-hash commands per register? How do you use them?
There was a problem hiding this comment.
assert-root-hash can appear as many times as you want in a RSF file. The idea is that every time you find one, you can use it to verify you have got all the previous entries right, it's a checkpoint really.
There was a problem hiding this comment.
Thanks, that answers my later question about whether an RSF file starts at the beginning of a register. I should have checked with echo -n | sha256sum. In that case should this spec require that the the first line of an RSF file be assert-root-hash?
There was a problem hiding this comment.
I suggested that in another comment, the potential problem is that currently it is not a requirement so some RSF might be broken if we impose this restriction. Let me assess how big of a deal it is really.
There was a problem hiding this comment.
Since the scope of this review is "Review the current implementation of RSF is accurately described in this RFC", I'll back off and open this in an issue instead.
content/rsf-spec/index.md
Outdated
| #### <a id="add-item-command">`add-item` command</a> | ||
|
|
||
| Adds a new [Item resource][item-res] to the register. It will require an | ||
| [`append-entry` command](#append-entry-command) to make it visible to users. |
There was a problem hiding this comment.
Instead of "make it visible to users", I would put "associate the item with a record key".
There was a problem hiding this comment.
umm, I like what you are suggesting but I don't like the idea of using the word "record". @michaelabenyohai ideas?
There was a problem hiding this comment.
How about "There must be a corresponding append-entry command that refers to the item's hash"? Noting the later comment on "It is illegal to have orphan items", which insists on the append-entry for the file to be valid.
| 2. The `key` of the entry. The primary key field is the field with the same | ||
| name as the register. | ||
| 3. The `timestamp` of the entry. This is the time at which the entry was | ||
| appended to the register. |
There was a problem hiding this comment.
I believe this is the first time we have defined what the timestamp means. Are we sure?
There was a problem hiding this comment.
I think it's the first time we have something that clear yes. Based on usage I'd say yes, the timestamp is the consequence of minting an item so it's the recording time for the entry. It mimics git's behaviour.
There was a problem hiding this comment.
It makes me nervous because that's only the way we've used it. I haven't thought about the consequences of timestamps being out of sequence. @michaelabenyohai am I being paranoid?
There was a problem hiding this comment.
what would be the problem of having timestamps out of sequence? The order of the log is dictated by the entry number.
There was a problem hiding this comment.
It could mess with tooling using the timestamp to infer something related to time outside "time of recording" but again, nothing you wouldn't see in git or similar.
I think it's up to the tooling to be zealous about timestamps to the extent a tool can be.
There was a problem hiding this comment.
For reference, RFC0003 covers this topic (#16 )
| * [Commands](#commands) are executed in order of appearance, top to bottom. | ||
| * Entries are numbered in sequence in order of appearance starting with 1 if | ||
| the register is empty, otherwise incrementing on the latest entry number | ||
| found in the register. |
There was a problem hiding this comment.
Is there any way to tell whether an RSF file represents a complete register, or only appends to a register?
There was a problem hiding this comment.
The only way as far as I know is via assert-root-hash. In a situation where the RSF has no assertions you wouldn't know.
There was a problem hiding this comment.
One possibility that would require evolving the implementation would be requiring an assert-root-hash as the first command of any RSF file.
There was a problem hiding this comment.
In that case I think it would be useful to require an assert-root-hash in the first line of every file, for the sake of reassembling a register from its patches.
There was a problem hiding this comment.
Should we specify that this currently does not include system entries (though it should do).
There was a problem hiding this comment.
By this I mean that entry numbers are computed independently for user and system currently (as they are separate logs on the inside).
There was a problem hiding this comment.
Yes, it needs to explain the current implementation exactly.
| the [`add-item` command](#add-item-command) that introduces the item is | ||
| referencing. | ||
| * It is illegal to have orphan items. An `add-item` must have at least one | ||
| `append-entry` referencing to the item. |
There was a problem hiding this comment.
If the the validity of an add-item depends on a subsequent append-entry, then a file that gets accidentally truncated in transit will be invalid. This seems unnecessarily strict, but I haven't thought hard about the disadvantages of orphan items. Is it to do with treating patches as transactions?
There was a problem hiding this comment.
Yes, the original discussion about tradeoffs of allowing orphan items or not made the team decide to be transactional per file so if something wrong happens in the middle of consuming a delta, it must be rolled back so you can retry again with the same delta without complex gymnastics. There were other considerations around how to handle orphans in the API and the garbage collection of them if allowed.
I think it's a reasonable restriction given the consequences of not having it.
There was a problem hiding this comment.
Does it matter whether the add-item command comes before or after the append-entry one?
There was a problem hiding this comment.
If my memory serves me well, it has to do with exposing the entry to the API users and not having the item available yet (the record couldn't be computed either). This is me reading between lines and guessing a bit here. In all fairness if we think it's ok to expose an entry as soon as it gets parsed, it means we are not behaving transactionally (a rollback could cause a disruption in someone that already consumed an item). I'll try to pull out the rationale for this restriction but, if it's not satisfactory I might reconsider having the restriction in place.
content/rsf-spec/index.md
Outdated
| #### Type checking | ||
|
|
||
| Although not part of the RSF specification, it is worth mentioning that a | ||
| Registers' implementation is expected to type check the data according to the |
There was a problem hiding this comment.
I don't think the apostrophe is needed on "Registers'"
| to the state rolled back. | ||
|
|
||
|
|
||
| ### Examples |
There was a problem hiding this comment.
Should these examples be valid patches?
There was a problem hiding this comment.
They should yes, if I haven't mess it up, they are 😱
There was a problem hiding this comment.
Haha, I don't think you've messed them up. But I think the spec should require an assert-root-hash in the first and final lines.
Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>
Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>
Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>
Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>
Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>
Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>
Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>
Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>
content/rsf-spec/index.md
Outdated
|
|
||
| append-entry = %s"append-entry" HTAB type HTAB key HTAB timestamp HTAB hash-list | ||
| type = "user" / "system" | ||
| key = alphanum / %x2D / %x5F |
There was a problem hiding this comment.
I'm not sure I follow what alphanum / %x2D / %x5F means. Can you explain?
There was a problem hiding this comment.
alphanum is defined a few lines below as alphanum = ALPHA / DIGIT, so the key definition is: "any ASCII character (A-Z, a-z) or a hypen (%x2D, -) or an underscore (%x5F, _)".
alpha and digit are defined in https://tools.ietf.org/html/rfc5234#appendix-B.1
%x2D and %x5F are the hexadecimal codes for hypen and underscore respectively and / defines an OR.
There was a problem hiding this comment.
Ah I missed the definition of alphanum so that makes sense now.
Does this imply there is only one character in a key? i.e it's either a single ASCII character, hyphen or underscore. It doesn't appear to say that a key can be made up of multiple of these things but I might be reading it wrong still.
There was a problem hiding this comment.
HA, very good point. I have to add multiple chars :)
I'm wondering if the first char has special treatment in the current implementation? (e.g. can a key start with a hypen: -foo? If not, this needs further restriction
| log = command *(CRLF command) [CRLF] | ||
| command = add-item / append-entry / assert-root-hash | ||
|
|
||
| assert-root-hash = %s"assert-root-hash" HTAB hash |
There was a problem hiding this comment.
Does %s mean case sensitive? If so, why do we only use it for "add-item" etc and not for things like "user"? Not that I think we've ever specified whether these things are case sensitive or not.
There was a problem hiding this comment.
Good point. Yes %s means case sensitive and all tokens that should be strictly in lower case should be prepended with that. I'll amend them 👍
There was a problem hiding this comment.
Fixed in a new commit.
|
|
||
| Asserts that the provided root hash is the same as the one computed from the | ||
| current entry log as defined in the [Digital Proofs][digital-proofs] | ||
| specification. |
There was a problem hiding this comment.
Should we specify that although the included in RSF, an assert-root-hash command currently ignores all system entries? It only includes user entries.
There was a problem hiding this comment.
Yes, absolutely. I will amend this too 👍
| ##### Arguments | ||
|
|
||
| 1. The `type` of the entry determines if the entry belongs to the data log | ||
| (`user`) or to the metadata log (`system`). |
There was a problem hiding this comment.
This implies that the data log and the metadata log are separate things. Is this intentional? I know they are kind of separate currently (e.g. system entries are ignored in root-hashes) but they do all appear in the same "log" in the RSF. I guess this kind of does correctly explain how things are now (even if we want to change them).
There was a problem hiding this comment.
I think this RFC should document how things are right now. When we change it, we will have another RFC that explains the change and can refer to the original RFC as its starting point.
There was a problem hiding this comment.
The logs are intertwined because A data item must conform to the current schema derived from the previous system entries, so even though system entries aren't in the root hashes, the RSF is still invalid if they are reordered.
There was a problem hiding this comment.
You are right, there is a level of checking that guarantees consistency cross log 👍
| * It is illegal to have broken references. An `append-entry` must reference an | ||
| item previously introduced by an `add-item` command. | ||
| * The item in the `add-item` command must always be in the canonical form. | ||
|
|
There was a problem hiding this comment.
Should we mention anything about identical consecutive append-entry commands currently being illegal?
i.e. this is illegal
append-entry user GB 2010-11-12T13:14:15Z sha-256:08bef0039a4f0fb52f3a5ce4b97d7927bf159bc254b8881c45d95945617237f6
append-entry user GB 2010-11-12T13:14:50Z sha-256:08bef0039a4f0fb52f3a5ce4b97d7927bf159bc254b8881c45d95945617237f6
This is ok
append-entry user GB 2010-11-12T13:14:15Z sha-256:08bef0039a4f0fb52f3a5ce4b97d7927bf159bc254b8881c45d95945617237f6
append-entry user GB 2010-11-12T13:14:30Z sha-256:490636974f8087e4518d222eba08851dd3e2b85095f2b1427ff6ecd3fa482435
append-entry user GB 2010-11-12T13:14:50Z sha-256:08bef0039a4f0fb52f3a5ce4b97d7927bf159bc254b8881c45d95945617237f6
There was a problem hiding this comment.
With the recent changes, is this still the situation? I'll need clarification on this.
There was a problem hiding this comment.
This is currently how it is implemented, yes. ORJ won't let you load RSF that looks like the first example. However, government-service register will currently produce illegal RSF, because we accidentally allowed these duplicate entries into the register before coding the rule.
There was a problem hiding this comment.
The rule is good hygiene but should it be enforced in the spec?
There was a problem hiding this comment.
The goal of the spec is to allow different implementations to be built independently and be compatible with each other. If we don't enforce the same rules, a valid RSF in one tool could be invalid in another one. To me, this is a spec failure.
There was a problem hiding this comment.
We can discuss if it should be a rule that exits parsing (and thus rolls back the entire patch), it should be ignored entirely (skipped) or raise a warning but keep parsing.
Given that the current implementation fails, the initial spec should reflect that exact behaviour.
content/rsf-spec/index.md
Outdated
| * It is illegal to have orphan items. An `add-item` must have at least one | ||
| `append-entry` referencing to the item. | ||
| * It is illegal to have broken references. An `append-entry` must reference an | ||
| item previously introduced by an `add-item` command. |
There was a problem hiding this comment.
In the current implementation, if a corresponding add-item does not appear in the RSF file then it checks to see if the item already exists in the register. If it does, then the RSF file is deemed valid, otherwise it's rejected. This means that an RSF file that has an "orphan entry" could actually be valid if it is a patch on an existing register.
The inverse is not true for add-item. If an add-item is in an RSF file, the corresponding append-entry must appear in the same RSF file.
There was a problem hiding this comment.
Ah! I misunderstood this behaviour. I'll amend 👍
Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>
Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>
Explains that the current behaviour uses only data entries to generate the root hash which leaves metadata entries unasserted. Also makes explicit that the numbering for data and metadata logs are independent. Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>
Ensures keys have one or more characters where the first is alphanumeric. Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>
Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>
Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>
There was a problem hiding this comment.
This makes sense to me, and I think it's good to specify the grammar and semantics of RSF, but I'd rather the registers spec doesn't require RSF as way to mint or download data. Can we leave out the REST API section entirely, and address it as an ADR in openregisters-java?
It feels like there are already a lot of different ways of downloading data from a register, and I'm wary about adding any more endpoints that do similar things to existing ones, because it makes the API harder to use. At the moment RSF is an internal thing and API users shouldn't need to understand it.
I like @michaelabenyohai's idea of combining with the download-register endpoint as the RSF could just be requested through content negotiation.
|
|
||
| append-entry = %s"append-entry" HTAB type HTAB key HTAB timestamp HTAB hash-list | ||
| type = %s"user" / %s"system" | ||
| key = alphanum *(alphanum / %x2D / %x5F) |
| append-entry = %s"append-entry" HTAB type HTAB key HTAB timestamp HTAB hash-list | ||
| type = %s"user" / %s"system" | ||
| key = alphanum *(alphanum / %x2D / %x5F) | ||
| hash-list = hash *(list-separator hash) |
There was a problem hiding this comment.
In the RSF I've looked at, this has just been a single item hash. When would this be a list?
There was a problem hiding this comment.
The situation is theoretical and it is when you have an index. It is experimental and in review to assess if benefits outweigh (perceived) complexity.
|
|
||
| ### Media type | ||
|
|
||
| The current media type is `application/uk-gov-rsf`. It should change to |
There was a problem hiding this comment.
As this is specific to ORJ, can we say that the media type of RSF is application/vnd.rsf but application/uk-gov-rsf may also be used for legacy reasons? Then we can fix that at any time without raising another RFC.
|
@MatMoore although I may agree RSF is not normative, I'd like to have it in the spec as an example of how the reference implementation has approached register serialisation. |
Context
RSF has been documented in multiple places but never in a consistent manner. This is an attempt to describe the current state of things and serve as the starting point to create the first RSF specification.
Note that RSF has no versioning system. This is a good opportunity to discuss if we should add one to allow non backwards-compatible changes without breaking clients.
Guidance to review
Review the current implementation of RSF is accurately described in this RFC and that the explanation is understandable and unequivocal.