EXPERIMENTAL translation of HCA
Caveat: this schema is entirely constructed via an automated import of the HCA json schema.
- there may be parts missing
- the direct mapping may not utilitize key parts of LinkML
The above is generated entirely from the schema, which comes from the json schema; as such it may be spare on details.
This is also using the older linkml documentation framework, which doesn't show all the schema
This was created using schema-automator
Utilizing the following HCA-specific extensions
- mapping of
user_friendly
tolinkml:title
- mapping HCA ontology extensions to dyanamic enums
The following modifications were made:
- Changed “10x” to “S10x” (because otherwise this creates awkward incompatibilities between the generated python classes and the schema)
- Modified hca/system/links.json to avoid name clashes with SupplementaryFile
I need to figure out exactly how the system/links schema is used in HCA. Currently it doesn't "connect up" to the rest of the schema.
It seems that some kind of extra-schema information is required
All plain json enums are mapped to LinkML enums. Note that we elected not to inline these, so there are a lot of "trivial" enums with one value where the intent is to restrict the value of a field.
In future, the permissible values could be mapped to ontology terms, but this info isn't in the schema.
HCA also uses a JSON schema extension for ontology enums, these are converted to LinkML dynamic enums, as below
LinkML:
DevelopmentStageOntology_ontology_options:
include:
- reachable_from:
source_ontology: obo:efo
source_nodes:
- EFO:0000399
- HsapDv:0000000
- UBERON:0000105
relationship_types:
- rdfs:subClassOf
is_direct: false
include_self: false
- reachable_from:
source_ontology: obo:hcao
source_nodes:
- EFO:0000399
- HsapDv:0000000
- UBERON:0000105
relationship_types:
- rdfs:subClassOf
is_direct: false
include_self: false
from:
"ontology": {
"description": "An ontology term identifier in the form prefix:accession.",
"type": "string",
"graph_restriction": {
"ontologies" : ["obo:efo", "obo:hcao"],
"classes": ["EFO:0000399", "HsapDv:0000000", "UBERON:0000105"],
"relations": ["rdfs:subClassOf"],
"direct": false,
"include_self": false
},
note the mapping is not quite direct. A seperate query is generated in linkml for each input ontology, where the
input seeds are repeated each time (include
takes the union of all subqueries)
I believe the semantics are the same as for the source, although some combos will yield empty sets?
The more natural way to author this in linkml would be to make the classes specific to each subquery.
To expand value sets:
poetry run sh utils/expand-value-sets.sh
This materializes the value set queries, so that:
- normal non-extended json-schema tooling can use them
- query results can be versioned alongside releases
These are included alongside as <NAME>.expanded.yaml
File sizes:
Note in particular that the species expanded subset in a quarter of a gigabyte...
Some of the expanded sets may be empty due to a mismatch in how HCA and OAK use CURIEs for EDAM
- project/ - project files (do not edit these)
- src/ - source files (edit these)
- human_cell_atlas
- schema -- LinkML schema (generated from HCA)
- human_cell_atlas
- datamodel -- Generated python datamodel
- tests - python tests
make all
: make everythingmake deploy
: deploys site
this project was made with linkml-project-cookiecutter