-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Update digest to support latest Nipoppy processing status files #166
base: main
Are you sure you want to change the base?
Conversation
- match column names to https://nipoppy.readthedocs.io/en/latest/schemas/index.html#imaging-bagel-file - make pipeline_starttime optional - remove "PHASE__" & "STAGE__" cols, "step" & "status" cols - simplify column descriptions
- "HAS_DATATYPE__", "HAS_IMAGE__"
- remove "IsPrefixedColumn" - rename ID columns & update their descriptions
- refactor out primary session column name into a variable
Hey @michellewang, requesting your review here on just the digest schema changes as well as text including READMEs and status descriptions. Will tag you on the relevant files/sections under files changed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please review @michellewang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please review @michellewang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alyssadai let me know if I missed anything!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't comment on actual line because it hasn't been changed in the PR but at one point this README refers to https://github.com/neurodatascience/nipoppy-qpn which gives me a 404 error
"IsRequired": false | ||
}, | ||
"session_id": { | ||
"Description": "Participant session identifier.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this Participant session
on purpose or should it be only Session
? For bids_session_id
it just says session
"Description": "Participant session identifier.", | |
"Description": "Session identifier.", |
"IsRequired": true | ||
}, | ||
"status": { | ||
"Description": "Completion status of the pipeline run or step for the subject-session pair. 'SUCCESS': All output files are present. 'FAIL': At least one output file is missing. 'INCOMPLETE': Parent pipeline has not been run for the subject session. 'UNAVAILABLE': Relevant MRI modality for pipeline not available for subject session.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INCOMPLETE
should/will be about the pipeline itself, not a parent pipeline. What you have in digest/utility.py
is good I think
"Description": "Completion status of the pipeline run or step for the subject-session pair. 'SUCCESS': All output files are present. 'FAIL': At least one output file is missing. 'INCOMPLETE': Parent pipeline has not been run for the subject session. 'UNAVAILABLE': Relevant MRI modality for pipeline not available for subject session.", | |
"Description": "Completion status of the pipeline run or step for the subject-session pair. 'SUCCESS': All output files are present. 'FAIL': At least one output file is missing. 'INCOMPLETE': Pipeline has not been run for the subject session. 'UNAVAILABLE': Relevant MRI modality for pipeline not available for subject session.", |
"IsRequired": true | ||
}, | ||
"bids_participant_id": { | ||
"Description": "BIDS-compliant participant identifier.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could explicitly say that this should include the sub-
prefix?
"IsRequired": true | ||
}, | ||
"bids_session_id": { | ||
"Description": "BIDS-compliant session identifier.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above but for ses-
prefix
"bids_id": { | ||
"Description": "BIDS dataset identifier for a participant, if available/different from the participant_id.", | ||
"bids_participant_id": { | ||
"Description": "BIDS-compliant participant identifier.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for imaging bagel: maybe we could explicitly say that this should include the sub-
prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question: is it on purpose that this file doesn't have bids_session_id
? I guess that column is not very important since the pheno file cannot be used as input to other Neurobagel things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alyssadai. Changes look good, I did flag one thing I believe is an accidental change/bug with pd.read_csv
to pd.read_tsv
. I haven't tried this locally, but I'm pretty sure that doesn't work. Even if I'm wrong about that, I'd say add a test that covers the load_file_from_path
route in utility
, because it doesn't seem tested yet.
if not file_path.exists(): | ||
return None, "File not found." | ||
|
||
bagel = pd.read_csv(file_path) | ||
bagel = pd.read_tsv(file_path, sep="\t") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀 I don't think read_tsv
exists in pandas. Was this an accidental replace-all?
sidenote: using rename symbol in vscode can be a safer way to do rename-refactors: https://code.visualstudio.com/docs/editor/refactoring#_rename-symbol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're likely not catching this with tests because you are only testing the load_file_from_contents
route. At some point it would make sense to use an e2e test library like cypress or playwright to user-test the whole thing. But even now it should be possible to do a simple smoke test with a temp .tsv file here
@@ -188,7 +189,7 @@ def process_bagel(upload_contents, available_digest_nclicks, filenames): | |||
{"type": schema, "data": overview_df.to_dict("records")}, | |||
pipelines_dict, | |||
None, | |||
"csv", | |||
"csv", # NOTE: "tsv" is not an option for export_format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"csv", # NOTE: "tsv" is not an option for export_format | |
"csv", # NOTE: the dash_table.DataTable object does not support "tsv" as an option for export_format |
Co-authored-by: Michelle Wang <tomichellewang@gmail.com>
PHASE__
andSTAGE__
column schema withpipeline_step
column #157pipeline_starttime
no longer mandatory #153Changes proposed in this pull request:
PHASE__
,STAGE__
,HAS_DATATYPE__
,HAS_IMAGE
,has_mri_data
pipeline_starttime
optionalpipeline_step
,status
IsPrefixedColumn
,MissingValue
column propertiesTODO: Update pre-generated QPN file paths
For reviewer: you can test out the changes in the digest using either of these files:
Checklist
This section is for the PR reviewer
[ENH]
,[FIX]
,[REF]
,[TST]
,[CI]
,[MNT]
,[INF]
,[MODEL]
,[DOC]
) (see our Contributing Guidelines for more info)Closes #XXXX
For new features:
For bug fixes: