-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
doc: Document the validation model, context and inheritance principle (…
…#94) Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
- Loading branch information
1 parent
0651306
commit f41eb71
Showing
4 changed files
with
264 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,160 @@ | ||
# The validation context | ||
|
||
The core structure of the validator is the `context`, | ||
a namespace that aggregates properties of the dataset (the `dataset` variable, above) | ||
and the current file being validated. | ||
|
||
Its type can be described as follows: | ||
|
||
```typescript | ||
Context: { | ||
// Dataset properties | ||
dataset: { | ||
dataset_description: object | ||
datatypes: string[] | ||
modalities: string[] | ||
// Lists of subjects as discovered in different locations | ||
subjects: { | ||
sub_dirs: string[] | ||
participant_id: string[] | ||
phenotype: string[] | ||
} | ||
} | ||
|
||
// Properties of the current subject | ||
subject: { | ||
// Lists of sessions as discovered in different locations | ||
sessions: { | ||
ses_dirs: string[] | ||
session_id: string[] | ||
phenotype: string[] | ||
} | ||
} | ||
|
||
// Path properties | ||
path: string | ||
entities: object | ||
datatype: string | ||
suffix: string | ||
extension: string | ||
// Inferred property | ||
modality: string | ||
|
||
// Inheritance principle constructions | ||
sidecar: object | ||
associations: { | ||
// Paths and properties of files associated with the current file | ||
aslcontext: { path: string, n_rows: integer, volume_type: string[] } | ||
... | ||
} | ||
|
||
// Content properties | ||
size: integer | ||
|
||
// File type-specific content properties | ||
columns: object | ||
gzip: object | ||
json: object | ||
nifti_header: object | ||
ome: object | ||
tiff: object | ||
} | ||
``` | ||
|
||
To take an example, in a minimal dataset containing only a single subject's T1-weighted image, | ||
the `context` for that image might be: | ||
|
||
```yaml | ||
dataset: | ||
dataset_description: | ||
Name: "Example dataset" | ||
BIDSVersion: "1.10.0" | ||
DatasetType: "raw" | ||
datatypes: ["anat"] | ||
modalities: ["mri"] | ||
subjects: | ||
sub_dirs: ["sub-01"] | ||
participant_id: null | ||
phenotype: null | ||
|
||
subject: | ||
sessions: { ses_dirs: null, session_id: null, phenotype: null } | ||
|
||
path: "/sub-01/anat/sub-01_T1w.nii.gz" | ||
entities: | ||
subject: "01" | ||
datatype: "anat" | ||
suffix: "T1w" | ||
extension: ".nii.gz" | ||
modality: "mri" | ||
|
||
sidecar: | ||
MagneticFieldStrength: 3 | ||
... | ||
associations: {} | ||
|
||
size: 22017017 | ||
nifti_header: | ||
dim: 3 | ||
voxel_sizes: [1, 1, 1] | ||
... | ||
``` | ||
|
||
Fields from this context can be queried using object dot notation. | ||
For example, `sidecar.MagneticFieldStrengh` has the integer value `3`, | ||
and `entities.subject` has the string value `"01"`. | ||
This permits the use of boolean expressions, such as | ||
`sidecar.RepetitionTime == nifti_header.pixdim[4]`. | ||
|
||
As the validator validates each file in turn, it constructs a new context. | ||
The `dataset` property remains constant, | ||
while a new `subject` property is constructed when inspecting a new subject directory, | ||
and the remaining properties are constructed for each file, individually. | ||
|
||
## Context definition | ||
|
||
The validation context is largely dictated by the [schema], | ||
and the full type generated from the schema definition can be found in | ||
[jsr:@bids/schema/context](https://jsr.io/@bids/schema/doc/context/~/Context). | ||
|
||
## Context construction | ||
|
||
The construction of a validation context is where BIDS concepts are implemented. | ||
Again, this is easiest to explain with pseudocode: | ||
|
||
```python | ||
def buildFileContext(dataset, file): | ||
context = namespace() | ||
context.dataset = dataset | ||
context.path = file.path | ||
context.size = file.size | ||
|
||
fileParts = parsePath(file.path) | ||
context.entities = fileParts.entities | ||
context.datatype = fileParts.datatype | ||
context.suffix = fileParts.suffix | ||
context.extension = fileParts.extension | ||
|
||
context.subject = buildSubjectContext(dataset, context.entities.subject) | ||
|
||
context.sidecar = loadSidecar(file) | ||
context.associations = namespace({ | ||
association: loadAssociation(file, association) | ||
for association in associationTypes(file) | ||
}) | ||
|
||
if isTSV(file): | ||
context.columns = loadColumns(file) | ||
if isNIfTI(file): | ||
context.nifti_header = loadNiftiHeader(file) | ||
... # And so on | ||
|
||
return context | ||
``` | ||
|
||
The heavy lifting is done in `parsePath`, `loadSidecar` and `loadAssociation`. | ||
`parsePath` is relatively simple, but `loadSidecar` and `loadAssociation` | ||
implement the BIDS [Inheritance Principle]. | ||
|
||
[Inheritance Principle]: https://bids-specification.readthedocs.io/en/stable/common-principles.html#the-inheritance-principle | ||
[schema]: https://bidsschematools.readthedocs.io/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Validation model | ||
|
||
The basic process of the BIDS validator operates according to the following | ||
[Python]-like pseudocode: | ||
|
||
```python | ||
def validate(directory): | ||
fileTree = loadFileTree(directory) | ||
dataset = buildDatasetContext(fileTree) | ||
|
||
for file in walk(dataset.fileTree): | ||
file_context = buildFileContext(dataset, file) | ||
for check in perFileChecks: | ||
check(file_context) | ||
|
||
for check in datasetChecks: | ||
check(dataset) | ||
``` | ||
|
||
The following sections will describe the [the validation context](context.md) | ||
and our implementation of [the Inheritance Principle](inheritance-principle.md). | ||
|
||
```{toctree} | ||
:maxdepth: 1 | ||
:hidden: | ||
context.md | ||
inheritance-principle.md | ||
``` | ||
|
||
[Python]: https://en.wikipedia.org/wiki/Python_(programming_language) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# The Inheritance Principle | ||
|
||
The [Inheritance Principle] is a core concept in BIDS. | ||
Its original definition (edited for brevity) was: | ||
|
||
> Any metadata file (`.json`, `.bvec`, `.tsv`, etc.) may be defined at any directory level, | ||
> but no more than one applicable file may be defined at a given level. | ||
> The values from the top level are inherited by all lower levels | ||
> unless they are overridden by a file at the lower level. [...] | ||
> There is no notion of "unsetting" a key/value pair. | ||
Here, "top level" means dataset root, and "lower level" means closer to the data file | ||
the metadata applies to. | ||
More recent versions of the specification have made the language more precise at the cost | ||
of verbosity. | ||
The core concept remains the same. | ||
|
||
The validator uses a "walk back" algorithm to find inherited files: | ||
|
||
```python | ||
def walkBack(file, extension): | ||
fileParts = parsePath(file.path) | ||
|
||
fileTree = file.parent | ||
while fileTree: | ||
for child in fileTree.children: | ||
parts = parsePath(child.path) | ||
if ( | ||
parts.extension == extension | ||
and parts.suffix = fileParts.suffix | ||
and isSubset(parts.entities, fileParts.entities) | ||
): | ||
yield child | ||
|
||
fileTree = fileTree.parent | ||
``` | ||
|
||
Using this basis, `loadSidecar` is simply: | ||
|
||
```python | ||
def loadSidecar(file): | ||
sidecar = {} | ||
for json in walkBack(file, '.json'): | ||
# Order matters. `|` overrides the left side with the right. | ||
# Any collisions resolve in favor of closer to the data file. | ||
sidecar = loadJson(json) | sidecar | ||
return sidecar | ||
``` | ||
|
||
For `loadAssociation`, only the first match is used, if found: | ||
|
||
```python | ||
def loadAssociation(file, association): | ||
for associated_file in walkBack(file, getExtension(association)): | ||
return getLoader(association)(associated_file) | ||
``` | ||
|
||
Each association contains different metadata to extract. | ||
Note that some associations have a different suffix from the files they associate to. | ||
The actual implementation of `walkBack` allows overriding suffixes as well as extensions, | ||
but it would not be instructive to show here. | ||
|
||
[Inheritance Principle]: https://bids-specification.readthedocs.io/en/stable/common-principles.html#the-inheritance-principle |