From f41eb717d8a9e73ec6f6b1ef620e728c5477cccb Mon Sep 17 00:00:00 2001 From: Chris Markiewicz Date: Mon, 11 Nov 2024 11:04:29 -0500 Subject: [PATCH] doc: Document the validation model, context and inheritance principle (#94) Co-authored-by: Yaroslav Halchenko --- docs/index.md | 10 ++ docs/validation-model/context.md | 160 ++++++++++++++++++ docs/validation-model/index.md | 31 ++++ .../validation-model/inheritance-principle.md | 63 +++++++ 4 files changed, 264 insertions(+) create mode 100644 docs/validation-model/context.md create mode 100644 docs/validation-model/index.md create mode 100644 docs/validation-model/inheritance-principle.md diff --git a/docs/index.md b/docs/index.md index 6055c5d5..a2e4e065 100644 --- a/docs/index.md +++ b/docs/index.md @@ -26,6 +26,7 @@ deno run -A jsr:@bids/validator ``` ```{toctree} +:maxdepth: 2 :hidden: :caption: User guide @@ -35,6 +36,7 @@ user_guide/issues.md ``` ```{toctree} +:maxdepth: 2 :hidden: :caption: Developer guide @@ -43,6 +45,14 @@ dev/contributing.md dev/environment.md ``` +```{toctree} +:maxdepth: 2 +:hidden: +:caption: Concepts + +validation-model/index.md +``` + ```{toctree} :hidden: :caption: Reference diff --git a/docs/validation-model/context.md b/docs/validation-model/context.md new file mode 100644 index 00000000..703d9269 --- /dev/null +++ b/docs/validation-model/context.md @@ -0,0 +1,160 @@ +# The validation context + +The core structure of the validator is the `context`, +a namespace that aggregates properties of the dataset (the `dataset` variable, above) +and the current file being validated. + +Its type can be described as follows: + +```typescript +Context: { + // Dataset properties + dataset: { + dataset_description: object + datatypes: string[] + modalities: string[] + // Lists of subjects as discovered in different locations + subjects: { + sub_dirs: string[] + participant_id: string[] + phenotype: string[] + } + } + + // Properties of the current subject + subject: { + // Lists of sessions as discovered in different locations + sessions: { + ses_dirs: string[] + session_id: string[] + phenotype: string[] + } + } + + // Path properties + path: string + entities: object + datatype: string + suffix: string + extension: string + // Inferred property + modality: string + + // Inheritance principle constructions + sidecar: object + associations: { + // Paths and properties of files associated with the current file + aslcontext: { path: string, n_rows: integer, volume_type: string[] } + ... + } + + // Content properties + size: integer + + // File type-specific content properties + columns: object + gzip: object + json: object + nifti_header: object + ome: object + tiff: object +} +``` + +To take an example, in a minimal dataset containing only a single subject's T1-weighted image, +the `context` for that image might be: + +```yaml +dataset: + dataset_description: + Name: "Example dataset" + BIDSVersion: "1.10.0" + DatasetType: "raw" + datatypes: ["anat"] + modalities: ["mri"] + subjects: + sub_dirs: ["sub-01"] + participant_id: null + phenotype: null + +subject: + sessions: { ses_dirs: null, session_id: null, phenotype: null } + +path: "/sub-01/anat/sub-01_T1w.nii.gz" +entities: + subject: "01" +datatype: "anat" +suffix: "T1w" +extension: ".nii.gz" +modality: "mri" + +sidecar: + MagneticFieldStrength: 3 + ... +associations: {} + +size: 22017017 +nifti_header: + dim: 3 + voxel_sizes: [1, 1, 1] + ... +``` + +Fields from this context can be queried using object dot notation. +For example, `sidecar.MagneticFieldStrengh` has the integer value `3`, +and `entities.subject` has the string value `"01"`. +This permits the use of boolean expressions, such as +`sidecar.RepetitionTime == nifti_header.pixdim[4]`. + +As the validator validates each file in turn, it constructs a new context. +The `dataset` property remains constant, +while a new `subject` property is constructed when inspecting a new subject directory, +and the remaining properties are constructed for each file, individually. + +## Context definition + +The validation context is largely dictated by the [schema], +and the full type generated from the schema definition can be found in +[jsr:@bids/schema/context](https://jsr.io/@bids/schema/doc/context/~/Context). + +## Context construction + +The construction of a validation context is where BIDS concepts are implemented. +Again, this is easiest to explain with pseudocode: + +```python +def buildFileContext(dataset, file): + context = namespace() + context.dataset = dataset + context.path = file.path + context.size = file.size + + fileParts = parsePath(file.path) + context.entities = fileParts.entities + context.datatype = fileParts.datatype + context.suffix = fileParts.suffix + context.extension = fileParts.extension + + context.subject = buildSubjectContext(dataset, context.entities.subject) + + context.sidecar = loadSidecar(file) + context.associations = namespace({ + association: loadAssociation(file, association) + for association in associationTypes(file) + }) + + if isTSV(file): + context.columns = loadColumns(file) + if isNIfTI(file): + context.nifti_header = loadNiftiHeader(file) + ... # And so on + + return context +``` + +The heavy lifting is done in `parsePath`, `loadSidecar` and `loadAssociation`. +`parsePath` is relatively simple, but `loadSidecar` and `loadAssociation` +implement the BIDS [Inheritance Principle]. + +[Inheritance Principle]: https://bids-specification.readthedocs.io/en/stable/common-principles.html#the-inheritance-principle +[schema]: https://bidsschematools.readthedocs.io/ diff --git a/docs/validation-model/index.md b/docs/validation-model/index.md new file mode 100644 index 00000000..3626e158 --- /dev/null +++ b/docs/validation-model/index.md @@ -0,0 +1,31 @@ +# Validation model + +The basic process of the BIDS validator operates according to the following +[Python]-like pseudocode: + +```python +def validate(directory): + fileTree = loadFileTree(directory) + dataset = buildDatasetContext(fileTree) + + for file in walk(dataset.fileTree): + file_context = buildFileContext(dataset, file) + for check in perFileChecks: + check(file_context) + + for check in datasetChecks: + check(dataset) +``` + +The following sections will describe the [the validation context](context.md) +and our implementation of [the Inheritance Principle](inheritance-principle.md). + +```{toctree} +:maxdepth: 1 +:hidden: + +context.md +inheritance-principle.md +``` + +[Python]: https://en.wikipedia.org/wiki/Python_(programming_language) diff --git a/docs/validation-model/inheritance-principle.md b/docs/validation-model/inheritance-principle.md new file mode 100644 index 00000000..97b9b72c --- /dev/null +++ b/docs/validation-model/inheritance-principle.md @@ -0,0 +1,63 @@ +# The Inheritance Principle + +The [Inheritance Principle] is a core concept in BIDS. +Its original definition (edited for brevity) was: + +> Any metadata file (`.json`, `.bvec`, `.tsv`, etc.) may be defined at any directory level, +> but no more than one applicable file may be defined at a given level. +> The values from the top level are inherited by all lower levels +> unless they are overridden by a file at the lower level. [...] +> There is no notion of "unsetting" a key/value pair. + +Here, "top level" means dataset root, and "lower level" means closer to the data file +the metadata applies to. +More recent versions of the specification have made the language more precise at the cost +of verbosity. +The core concept remains the same. + +The validator uses a "walk back" algorithm to find inherited files: + +```python +def walkBack(file, extension): + fileParts = parsePath(file.path) + + fileTree = file.parent + while fileTree: + for child in fileTree.children: + parts = parsePath(child.path) + if ( + parts.extension == extension + and parts.suffix = fileParts.suffix + and isSubset(parts.entities, fileParts.entities) + ): + yield child + + fileTree = fileTree.parent +``` + +Using this basis, `loadSidecar` is simply: + +```python +def loadSidecar(file): + sidecar = {} + for json in walkBack(file, '.json'): + # Order matters. `|` overrides the left side with the right. + # Any collisions resolve in favor of closer to the data file. + sidecar = loadJson(json) | sidecar + return sidecar +``` + +For `loadAssociation`, only the first match is used, if found: + +```python +def loadAssociation(file, association): + for associated_file in walkBack(file, getExtension(association)): + return getLoader(association)(associated_file) +``` + +Each association contains different metadata to extract. +Note that some associations have a different suffix from the files they associate to. +The actual implementation of `walkBack` allows overriding suffixes as well as extensions, +but it would not be instructive to show here. + +[Inheritance Principle]: https://bids-specification.readthedocs.io/en/stable/common-principles.html#the-inheritance-principle