Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically assign sex if unknown #148

Merged
merged 8 commits into from
May 17, 2024

Conversation

fellen31
Copy link
Collaborator

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@fellen31 fellen31 linked an issue May 15, 2024 that may be closed by this pull request
Copy link

github-actions bot commented May 15, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 2a50981

+| ✅ 165 tests passed       |+
#| ❔  17 tests were ignored |#
!| ❗  15 tests had warnings |!

❗ Test warnings:

  • readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
  • pipeline_todos - TODO string in main.nf: Remove this line if you don't need a FASTA file
  • pipeline_todos - TODO string in README.md: Add full-sized test dataset and amend the paragraph below if applicable
  • pipeline_todos - TODO string in README.md: If applicable, make list of people who have also contributed
  • pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
  • pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in ci.yml: You can customise CI pipeline run tests as required
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • schema_description - No description provided in schema for parameter: hificnv_xy
  • schema_description - No description provided in schema for parameter: hificnv_xx

❔ Tests ignored:

  • files_exist - File is ignored: CODE_OF_CONDUCT.md
  • files_exist - File is ignored: assets/nf-core-nallo_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-nallo_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-nallo_logo_dark.png
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • files_exist - File is ignored: conf/modules.config
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • files_unchanged - File ignored due to lint config: CODE_OF_CONDUCT.md
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File ignored due to lint config: assets/nf-core-nallo_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-nallo_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-nallo_logo_dark.png
  • modules_config - modules_config

✅ Tests passed:

Run details

  • nf-core/tools version 2.14.1
  • Run at 2024-05-15 13:16:53

@fellen31 fellen31 marked this pull request as ready for review May 15, 2024 08:50
@fellen31 fellen31 requested a review from a team as a code owner May 15, 2024 08:50
Copy link
Contributor

@rannick rannick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just 2 small comments

@@ -1,3 +1,3 @@
sample,file,family_id,paternal_id,maternal_id,sex,phenotype
sample_1,/path/to/fastq_or_bam/files/sample_1.fastq.gz,FAM,PAT,MAT,1,1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have 2 tests samplesheet, one with and one without the sex assigned to test all logical gates? Maybe just a stub run. Just throwing ideas

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! And I have, but you can't see it here.

I removed the test data and samplesheets from this repository, and put it in the nallo branch of the GMS fork of test-datasets to make it more similar to how it's handled in for example raredisease and rnafusion. I made a PR to update the test data and samplesheets used (the multisample test now includes one sample where sex is set to 0).

But maybe that is problematic, because:

  • The changes made are not recorded here
  • Updating the test data might break the master test profile

Do you have a suggestion for the best solution here? @jemten?

  • I could change the links in master & dev to point to specific commits for each file
  • I could move the samplesheets back to this repo, but will it work smoothly when merging dev to master?

I'm sure I will have to continue to refine the test data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it is a good idea to hav the specific commit hash in the url. As you say that way we could have a different sample sheet in master and dev.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might make a patch to master then.

"--secondary=no",
"-Y",
"-R @RG\\\\tID:${meta.id}\\\\tSM:${meta.id}"
].join(' ') }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this does not seem to relate the to sex check, so I would add a line to the changelog about it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding RG was necessary for somalier, but I see your point as the change is broad. Will add to changelog!

Copy link
Member

@peterpru peterpru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be an idea to do replace ASSIGN with INFER or ESTIMATE? To me assign sounds like the pipeline is choosing a default value, but you are actively inferring or estimating the sex.

@fellen31
Copy link
Collaborator Author

Would it be an idea to do replace ASSIGN with INFER or ESTIMATE? To me assign sounds like the pipeline is choosing a default value, but you are actively inferring or estimating the sex.

Sure, I could use INFER since that is the flag supplied to somalier relate.

Copy link
Collaborator

@jemten jemten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to add a sex check to the pipeline. A couple of notes

"somalier_sites": {
"type": "string",
"description": "A VCF of known polymorphic sites",
"format": "file-path"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to add some more constrains here. Like:

"pattern": "^\\S+\\.vcf(\\.gz)?$"

family_id : meta.family_id,
paternal_id : meta.paternal_id,
maternal_id : meta.maternal_id,
sex : meta2.infered_sex.toInteger(), // Use sex from somalier, only updated if sex is unknown
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this always update sex with the inferred sex from somalier?

.map { it ->
new_meta = [
sample_id: it.sample_id,
infered_sex: it.sex,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
infered_sex: it.sex,
inferred_sex: it.sex,

family_id : meta.family_id,
paternal_id : meta.paternal_id,
maternal_id : meta.maternal_id,
sex : meta2.infered_sex.toInteger(), // Use sex from somalier, only updated if sex is unknown
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sex : meta2.infered_sex.toInteger(), // Use sex from somalier, only updated if sex is unknown
sex : meta2.inferred_sex.toInteger(), // Use sex from somalier, only updated if sex is unknown

Copy link
Collaborator

@jemten jemten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@fellen31 fellen31 merged commit 55b319f into genomic-medicine-sweden:dev May 17, 2024
6 checks passed
@fellen31 fellen31 deleted the add-sex-check branch May 17, 2024 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Automatically assign sex if unknown
4 participants