Skip to content

Example datasets for bep036 #465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open

Conversation

Arshitha
Copy link

@Arshitha Arshitha commented Aug 29, 2024

Added pheno001 and pheno002 example dataset inspired by ds004215 on OpenNeuro but significantly modified to keep it simple and easy to convey the various use cases proposed in BEP036.

Use cases covered (and to be added to this PR):

  • pheno001 - Single session with both phenotype and imaging data
  • pheno002 - Two sessions with one imaging data only session
  • pheno003 - Two sessions with one phenotype data only session
  • pheno004 - Two sets of sessions. One set of sessions (e.g. screening, baseline, followup, etc) for phenotype data and another set of sessions (e.g. 01, 02, etc) for imaging data.

Still in draft state but would appreciate any and all feedback.

Pinging co-contributors: @ericearl @SamGuay @surchs

@Arshitha Arshitha marked this pull request as draft August 29, 2024 21:20
Copy link
Contributor

@ericearl ericearl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks! I'm guessing you left it in draft state becausew of pheno001 and pheno002, right?

I think we should remove the age_at_visit column/field from all phenotype/ measurement tools and instead provide a root-level sessions file with that field. Should we maybe take that a step farther and RECOMMEND or say it's OPTIONAL to add age to the sessions file?

@Arshitha
Copy link
Author

I like that idea. It's redundant information that can be aggregated to sessions level, and can be a recommendation in the BEP.

@Arshitha
Copy link
Author

It's in Draft state because I haven't prepared pheno003 and pheno004 but yes, all four example datasets will violate the contribution guidelines.

Arshitha and others added 4 commits August 30, 2024 15:59
Co-authored-by: Eric Earl <eric.earl@nih.gov>
Co-authored-by: Eric Earl <eric.earl@nih.gov>
Co-authored-by: Eric Earl <eric.earl@nih.gov>
Co-authored-by: Eric Earl <eric.earl@nih.gov>
@christinerogers
Copy link

christinerogers commented Oct 24, 2024

Got a question from @dominikwelke --

Could this PR include an example showing how to represent multiple runs from one participant-session?

@ericearl mentioned today this is easily done by adding a run column in the .tsv, would be nice to see illustrated and mentioned here.

- All participants.tsv files have been simplified.
- pheno004 has become instead an example of some imaging-only, some phenotype-only, and some with both data
@ericearl
Copy link
Contributor

ericearl commented Feb 6, 2025

I hijacked the not yet created pheno004 for use with some current bids-validator testing needs at bids-standard/bids-specification#2044. Can we make this a non-draft PR and get it merged hopefully? @effigies @Arshitha @SamGuay @surchs

@effigies
Copy link
Contributor

effigies commented Feb 6, 2025

Please set the BIDS_SCHEMA environment variable to https://bids-specification--2044.org.readthedocs.build/en/2044/schema.json here:

run: echo BIDS_SCHEMA=https://bids-specification.readthedocs.io/en/latest/schema.json >> $GITHUB_ENV

Please also add pheno004 to be skipped on legacy and stable:

- name: Skip legacy validation for post-legacy datasets
run: for DS in mrs_* dwi_deriv; do touch $DS/.SKIP_VALIDATION; done
if: matrix.bids-validator == 'legacy'
shell: bash

- name: Skip stable validation for datasets with unreleased features
run: for DS in dwi_deriv; do touch $DS/.SKIP_VALIDATION; done
if: matrix.bids-validator != 'dev'
shell: bash

@ericearl
Copy link
Contributor

ericearl commented Feb 6, 2025

@effigies Is that comment just above here a note for me? I'm confused by most of it and don't feel safe editing those files as-is. If you need me to take care of that, can I sit with you, Ross, or Nell to figure it out or have it explained to me enough to be able to do the work?

@effigies
Copy link
Contributor

effigies commented Feb 6, 2025

Okay, I did what I asked. It looks like there are issues in the schema that need to be addressed, but also there are unrelated issues in pheno001-003: https://github.com/bids-standard/bids-examples/actions/runs/13188395001/job/36815880378?pr=465

@ericearl
Copy link
Contributor

ericearl commented Feb 7, 2025

This is super-helpful @effigies, thank you! I'm bringing the errors out of the logs here for us (@Arshitha @SamGuay @surchs):

# pheno001

	[ERROR] MISSING_DATASET_DESCRIPTION A dataset_description.json file is required in the root of the dataset
		

	Please visit https://neurostars.org/search?q=MISSING_DATASET_DESCRIPTION for existing conversations about this issue.

	[ERROR] JSON_INVALID Not a valid JSON file.
		/sub-01/anat/sub-01_T1w.json
		/sub-01/anat/sub-01_T1w.json

		2 more files with the same issue

	Please visit https://neurostars.org/search?q=JSON_INVALID for existing conversations about this issue.

# pheno002

	[ERROR] MISSING_DATASET_DESCRIPTION A dataset_description.json file is required in the root of the dataset
		

	Please visit https://neurostars.org/search?q=MISSING_DATASET_DESCRIPTION for existing conversations about this issue.

	[ERROR] TSV_COLUMN_ORDER_INCORRECT Some TSV columns are in the incorrect order
		session_id
		/sessions.tsv - Column 0 (starting from 0) found at index 1.

	Please visit https://neurostars.org/search?q=TSV_COLUMN_ORDER_INCORRECT for existing conversations about this issue.

	[ERROR] TSV_INDEX_VALUE_NOT_UNIQUE An index column(s) was specified for the tsv file and not all of the values for it are unique.
		/sessions.tsv - Row: 4, Value: 01
		/sessions.tsv - Row: 5, Value: 02

	Please visit https://neurostars.org/search?q=TSV_INDEX_VALUE_NOT_UNIQUE for existing conversations about this issue.

	[ERROR] JSON_INVALID Not a valid JSON file.
		/sub-01/ses-01/anat/sub-01_ses-01_T1w.json
		/sub-01/ses-01/anat/sub-01_ses-01_T1w.json

		6 more files with the same issue

	Please visit https://neurostars.org/search?q=JSON_INVALID for existing conversations about this issue.

# pheno003

	[ERROR] MISSING_DATASET_DESCRIPTION A dataset_description.json file is required in the root of the dataset
		

	Please visit https://neurostars.org/search?q=MISSING_DATASET_DESCRIPTION for existing conversations about this issue.

	[ERROR] TSV_COLUMN_ORDER_INCORRECT Some TSV columns are in the incorrect order
		session_id
		/sessions.tsv - Column 0 (starting from 0) found at index 1.

	Please visit https://neurostars.org/search?q=TSV_COLUMN_ORDER_INCORRECT for existing conversations about this issue.

	[ERROR] TSV_INDEX_VALUE_NOT_UNIQUE An index column(s) was specified for the tsv file and not all of the values for it are unique.
		/sessions.tsv - Row: 4, Value: baseline

	Please visit https://neurostars.org/search?q=TSV_INDEX_VALUE_NOT_UNIQUE for existing conversations about this issue.

	[ERROR] JSON_INVALID Not a valid JSON file.
		/sub-01/ses-baseline/anat/sub-01_ses-baseline_T1w.json
		/sub-01/ses-baseline/anat/sub-01_ses-baseline_T1w.json

		2 more files with the same issue

	Please visit https://neurostars.org/search?q=JSON_INVALID for existing conversations about this issue.

@Arshitha
Copy link
Author

Arshitha commented Apr 3, 2025

@ericearl I fixed some of the bids validation errors but I'm not sure how to fix the following:

~/Desktop/Projects/bep036/bids-examples -> master
(datasci) Thu Apr  3 18:44:02 2025 ❯ bids-validator-deno pheno002 --ignoreWarnings
	[ERROR] TSV_COLUMN_ORDER_INCORRECT Some TSV columns are in the incorrect order
		session_id
		/sessions.tsv - Column 0 (starting from 0) found at index 1.

	Please visit https://neurostars.org/search?q=TSV_COLUMN_ORDER_INCORRECT for existing conversations about this issue.

	[ERROR] TSV_INDEX_VALUE_NOT_UNIQUE An index column(s) was specified for the tsv file and not all of the values for it are unique.
		/sessions.tsv - Row: 4, Value: 01
		/sessions.tsv - Row: 5, Value: 02

	Please visit https://neurostars.org/search?q=TSV_INDEX_VALUE_NOT_UNIQUE for existing conversations about this issue.


          Summary:                         Available Tasks:        Available Modalities:
          18 Files, 41.2 MB                                        MRI
          2 - Subjects 2 - Sessions

	If you have any questions, please post on https://neurostars.org/tags/bids.

I checked the TSV files and they are valid TSV files with no apparent issues in the "column order" which is one of the errors. @effigies could this be related to the validator issues you mentioned earlier?

@Arshitha
Copy link
Author

Arshitha commented Apr 3, 2025

@ericearl - For pheno004, I added dataset_description.json and replaced empty nifti and json with non-empty ones from an openneuro dataset. Lmk if you'd like me to undo those changes.

@Arshitha Arshitha requested a review from ericearl April 3, 2025 13:23
"Description": "Age of the participant.",
"Units": "years"
},
"sex": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm conflicted on whether sex at birth should go in the participants.tsv or the demographics.tsv. I know OpenNeuro crawls participants.tsv files better right now to improve its search functionality, which may be justification enough to move this. Anyway, this is a point of discussion and I am open to other ideas.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericearl Based on our slack discussion, I think we can leave the demographics info in the demographics file with participants.tsv being a list of unique participant IDs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NIfTIs should be 0 Byte empty files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericearl The validator was complaining about this so I replaced empty files with actual Niftis (from openneuro) in the other examples and this new example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NIfTIs should be 0 Byte empty files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the above comment.

@Arshitha
Copy link
Author

@christinerogers pheno005 is now available for review as the multi-run representative example!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants