Replies: 2 comments 2 replies
-
This change would also remove the |
Beta Was this translation helpful? Give feedback.
-
The other problem with Horreum's schema approach is how it correlates labels across schema versions. Labels are correlated by name. Names are mutable and non-exclusive. If
I agree that users should be able to interact with labels based on the fields they define but Horreum should not use those mutable fields as the only form of correlation. |
Beta Was this translation helpful? Give feedback.
-
This one is probably controversial but I see the current Schema implementation as a liability that fails to provide the desired benefit and instead adds complexity and burden on Horreum's users.
Schema goals
Schema reality
re-use
Schemas ownership does not match the scope of its usage. Teams either do not control the labels on their data or have to create a new schema. The vast majority of teams have opted to create a new schema per test. Creating a schema per test conceptually ties those labels to the test but we are currently forcing users to toggle between multiple views to work with labels because of how we organized them on schemas.
There is also a problem with the eventual consistency of Horreum's current shared schema with labels. Teams will have different label values needs from a common json structure so as more teams use a common json structure we are likely to add more labels to that common structure. Each team will inherit the labels from every other team (because of the sharing) and the burden on Horreum will far exceed the desires of any individual team.
Note: This eventual consistency mode is because of how schema's were implemented and not because of the goal of re-use for common data structures.
This is currently not the case because the vast majority of teams create a new schema for each
Test
.versioning
Schema label versioning is globally mapped by label name. Changing a label's name in one schema requires changing the matching name across all relevant schemas (see above issue about ownership) or it completely breaks the versioning implementation. Furthermore. All labels have to be copied to the next schema version otherwise users do not have access to the label values when using the next schema. This increases the burden on users for a feature that was meant to make it easier for users to address changes to data formats.
complexity
Schemas add yet another high level domain object for users to learn when working in Horreum. They can be summarized as simply a named collection of labels but unfortunately that not fully accurate. Labels are linked by name across schemas and schemas also own the transformations for dataset (see #1499 for the issue I have with datasets) which can lead to a secondary schema with a separate set of labels.
There is also the added implementation complexity of the entity relationships between runs, schemas, labels, and values. The SQL alone requires two or three additional joins to get the set of appropriate labels for a run which adds to the technical dept for a feature that most teams avoid.
Proposal
Label
's are defined on aTest
and not re-usable across tests.This will lead to duplicated labels which is better than sharing by reference outside the scope of ownership.
Schemas
in Horreum go back to beingjson-schemas
and being purely for json structure validation.I support the idea of making it easier for users to create labels, particularly for common json structures but I do not see where our current implementation provides benefit to warrant the complexity
These changes do not fully address the stated goals of re-use or versioning. The re-use does not exist with our current implementation so I believe it is safe to remove Schemas until we have a better design for Label re-use. Versioning is better addressed with the composable label proposal #1499 because it does not require unnecessary labels that did not not change between versions and instead gives uesrs the ability to adapt over time.
Beta Was this translation helpful? Give feedback.
All reactions