Skip to content
This repository was archived by the owner on Mar 15, 2021. It is now read-only.

Comments

RFC0005: Reference strategy#11

Merged
arnau merged 4 commits intomasterfrom
reference-strategy
Aug 13, 2018
Merged

RFC0005: Reference strategy#11
arnau merged 4 commits intomasterfrom
reference-strategy

Conversation

@arnau
Copy link
Contributor

@arnau arnau commented Mar 29, 2018

Context

This RFC proposes consolidating references in Registers to CURIEs only.

Guidance to review

  • Review the description of the reality for foreign keys and CURIEs is accurate.
  • Review the proposal is coherent.
  • Raise any doubts or concerns.

1. Compose a record set URI: `https://country.register.gov.uk/records/`
1. Concatenate the URI with the value: `https://country.register.gov.uk/records/GB`

There is a special case here were a foreign key refers to a register register
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo, were should be where



TODO: Where does the context live? It can't be part of the register API
because a Register could be part of multiple environments.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is an example of a register being part of multiple environments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The immediate situation would be a Register being part of the register.gov.uk environment and being part of a local/test environment.

Another one, artificial case right now, would be a register from NHS that is part of the NHS environment and part of ours.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean the CURIE doesn't resolve to a "single source of truth" but instead just one of many places where that truth is published?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It resolves to the source of truth you decided. Which most of the time is our environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelabenyohai and I had a chat about another topic that led to a discussion around the possibility to encode more specific data instead of the CURIE (as per the spec) so it could be proved correct. But right now it is just a thought.

* `example:32` -> `https://example.org/32`


TODO: Where does the context live? It can't be part of the register API
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be tamper-proof though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone tampers with the context, then a CURIE would resolve to the wrong data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, it's something to keep an eye on.

@arnau arnau force-pushed the reference-strategy branch 2 times, most recently from bffede7 to dbc76ee Compare July 4, 2018 12:12
@arnau arnau removed the wip label Jul 4, 2018
Copy link

@michaelabenyohai michaelabenyohai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few thoughts that came to mind. I'm not disagreeing with your points, just mentioning things we might want to note that we have considered.

On a more general note, do we want to consider how this affects our current field naming rules and conventions and the difficulties we have had with this recently?

Specifically, we still have the rule that the key field of a register must have the same name as the register. This guidance instructs that these fields must no longer be created as a foreign key (which they always have previously). This in turn means that this field name can never be reused as a link from another register (which we currently do a lot). Instead, we will have to create a new field with a slightly different name, so could end up proliferating fields that come in "pairs" with similar-but-not-quite-the-same name.

I feel we could treat this as the key field should describe "the thing", whereas the curie field in the "other thing" register should describe the relationship of "the thing" to "other thing". In practice I think this could be hard though.

We've previously considered removing the rule that the key field always has the same name as the register and start calling it id (although there's no reason why it has to be the same in every register, since it is the key property of the entry that is actually mandatory to name correctly). The crux of my point is whether we need accelerate this change to make it feasible to remove foreign keys in practice?


### Foreign Key

A foreign key is a regular string that _happens_ to be an identifier in

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't always have to be a string. It could also be any other datatype. Not sure if we care about that detail here.

But on that note. Is there ever any benefit in being able to say a link also has a datatype like integer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this, it connects with another discussion about expectations around primary key values: #18

In there I was only considering strings but your comment opens the door to more (potential) complexity.

I think we care about how flexible we must be with identifiers that can be used in CURIEs (and because historical reasons in foreign keys). In particular with CURIEs, the value needs to be encoded in terms of a URL path so foo:1 would be the same if the identifier is a string or an integer. I'm leaning towards exploring issues derived to restrict identifiers to be only a subset of UTF-8.

when found in another register it acts as a foreign key. To get the URI of the
referenced resource you have to:

1. Given a fieldname `country`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually be:

Given a field with name x, where the record for x in the Field register has the register field populated with value country

I.e. to be clear, the field name does not always have the same name as the foreign register to be a foreign key, though it often does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know of any cases where this is not the case?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field allergen-group is one and the field fields is another.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this point still needs addressing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, changed with your suggestion.

1. Compose a record set URI: `https://country.register.gov.uk/records/`
1. Concatenate the URI with the value: `https://country.register.gov.uk/records/GB`

There is a special case here where a foreign key refers to a register register

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would counter that this is not what it actually means. In the example below it still just resolves to https://register.register.gov.uk/country - but we hoped that people might then follow that to the contents of https://country.register.gov.uk/records. Though that is a problem in itself. We'd have the same problem with curies if we hadn't done the URI consolidation work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll need a chat with you on this one.

* To know a field contains a foreign key, you need to look up the register field
in the field definition. If it is informed, the datatype is not “string” but
“key”.
* Only able to reference one register per field.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always a problem? Or does it help consumers to know that the thing in a field will always be in the same place? Are there performance benefits knowing you don't have to check the location of every single record?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right in that it is not a problem in itself, just a restriction. The problem derived from this restriction is the need to have multiple fields for similar things when the situation arises. I'll think in a better way to phrase this.

in the field definition. If it is informed, the datatype is not “string” but
“key”.
* Only able to reference one register per field.
* Linking a full set of records is a special case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say linking a full set of records is not possible (see above).


#### Problems

* To know a field contains a foreign key, you need to look up the register field

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have the same problem with CURIEs that we don't mention below. That is you still need to look up the field definition to find out whether it is a curie or a string. "country:GB" is a valid string as well as a valid curie and they mean very different things.

Or is the point you are trying to get at that you have to check a field other than the datatype field in the field definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my point was around not being able to know by the datatype.

This RFC accepts CURIE as the only mechanism for linking between register from
now on given they offer the flexibility required by registers.

Existing registers that use foreign keys will be maintained to avoid

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we make a plan to evolve them to use curies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should, yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, this plan is out of scope for this RFC.

@arnau
Copy link
Contributor Author

arnau commented Jul 16, 2018

Specifically, we still have the rule that the key field of a register must have the same name as the register. This guidance instructs that these fields must no longer be created as a foreign key (which they always have previously). This in turn means that this field name can never be reused as a link from another register (which we currently do a lot). Instead, we will have to create a new field with a slightly different name, so could end up proliferating fields that come in "pairs" with similar-but-not-quite-the-same name.

I feel we could treat this as the key field should describe "the thing", whereas the curie field in the "other thing" register should describe the relationship of "the thing" to "other thing". In practice I think this could be hard though.

Very good point, I'll address it in the RFC.

@arnau arnau force-pushed the reference-strategy branch from e3ecace to 7afe483 Compare July 18, 2018 08:28
@arnau arnau requested a review from MatMoore July 18, 2018 08:29
@arnau
Copy link
Contributor Author

arnau commented Jul 18, 2018

@michaelabenyohai I've changed a few bits to address some of your comments, please have another read and see if it's better now.

@MatMoore
Copy link

The proposal makes sense to me based on what I know about how things work now.

Am I right in thinking that to actually replace the existing foreign keys with curies we first need to create the metadata log?

Also, even if we're not adding any more foreign keys, will we retain any information in the spec about the semantics of them, or will it become purely an implementation detail of specific ORJ registers? I'm just wondering whether a spec-compliant client would still be expected to resolve these kinds of links.

@arnau
Copy link
Contributor Author

arnau commented Jul 19, 2018

Am I right in thinking that to actually replace the existing foreign keys with curies we first need to create the metadata log?

Yes, in particular the schema evolution (adding fields) #5

Also, even if we're not adding any more foreign keys, will we retain any information in the spec about the semantics of them, or will it become purely an implementation detail of specific ORJ registers? I'm just wondering whether a spec-compliant client would still be expected to resolve these kinds of links.

Good point, I think the spec should, at best, mention that at some point a register had this assumption; so some sort of informative section. But a client shouldn't be expected to implement legacy assumptions.

Moreover, the current spec doesn't define foreign keys at all, so in a way we are addressing an assumption we have encoded in our implementation and clients. I opened an issue in the spec repo to address this issue: openregister/specification#84

Copy link

@MatMoore MatMoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limitation of not knowing how to resolve a CURIE to a URL could be solved by including the context within the registers resource.

The expansion mechanism would then be

  1. Given a CURIE country:GB
  2. And a register resource
  3. Take the curie's prefix: country
  4. Look up the prefix in the context object of the register resource to get a URL: https://country.register.gov.uk
  5. Add /records/ to it: https://country.register.gov.uk/records/
  6. Add the CURIE value to it: https://country.register.gov.uk/records/GB

In theory we could provide this information through the API without storing the data in the register itself, for example by making all registers on .register.gov.uk have a context that resolves all CURIEs to .register.gov.uk.


Existing registers that use foreign keys will be maintained to avoid
disruption in the registers ecosystem. An independent task will define and
execute the migration from foreing keys to CURIEs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a typo here: "foreing"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually you could just drop this whole sentence as it doesn't say when this will happen or how.

to a register (set of records). It can be seen as the natural consequence of
[RFC 0002: URI Consolidation](https://github.com/openregister/registers-rfcs/blob/master/content/uri-consolidation/index.md).

Note that other types of references (e.g. subset of records) are out of the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"out of the scope" should be "out of scope"

@arnau
Copy link
Contributor Author

arnau commented Aug 7, 2018

The limitation of not knowing how to resolve a CURIE to a URL could be solved by including the context within the registers resource.

This is indeed the current thinking for how to handle reference resolution. It requires schema evolution to add this information to the catalog (register register).

But it's not of this RFC to say how the catalog works. There are more twists on this, happy to discuss it further.

* Only able to reference one register per field.
* Linking a full set of records is a special case.
* You need a field for each register even if they have the same type of
relationship with the record (e.g. owned by, managed by).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At all or part of it? Let me try again:

With foreign keys, the field has a local identifier so the field itself is dedicated to a single type of link (e.g. local-authority-eng). If you need to provide links to e.g. local-authority-nir as well, you need another column. Even though both have the same intention (relationship wise).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that makes sense now. I think it would be good to clarify that you "need a field for each register linked to"

@arnau arnau changed the title Reference strategy RFC0005: Reference strategy Aug 9, 2018
Arnau Siches added 3 commits August 13, 2018 13:50
RFC describing the two mechanisms for linking registers and the proposal
to use CURIEs from now on.

Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>
Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>
Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>
@arnau arnau force-pushed the reference-strategy branch from 9d7ea58 to 159b93a Compare August 13, 2018 12:55
Signed-off-by: Arnau Siches <arnau.siches@digital.cabinet-office.gov.uk>
@arnau arnau merged commit ed6a2b0 into master Aug 13, 2018
arnau pushed a commit that referenced this pull request Aug 13, 2018
@arnau arnau deleted the reference-strategy branch August 13, 2018 12:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants