-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically assign sex if unknown #148
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just 2 small comments
@@ -1,3 +1,3 @@ | |||
sample,file,family_id,paternal_id,maternal_id,sex,phenotype | |||
sample_1,/path/to/fastq_or_bam/files/sample_1.fastq.gz,FAM,PAT,MAT,1,1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to have 2 tests samplesheet, one with and one without the sex assigned to test all logical gates? Maybe just a stub run. Just throwing ideas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely! And I have, but you can't see it here.
I removed the test data and samplesheets from this repository, and put it in the nallo branch of the GMS fork of test-datasets to make it more similar to how it's handled in for example raredisease and rnafusion. I made a PR to update the test data and samplesheets used (the multisample test now includes one sample where sex is set to 0).
But maybe that is problematic, because:
- The changes made are not recorded here
- Updating the test data might break the master test profile
Do you have a suggestion for the best solution here? @jemten?
- I could change the links in master & dev to point to specific commits for each file
- I could move the samplesheets back to this repo, but will it work smoothly when merging dev to master?
I'm sure I will have to continue to refine the test data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it is a good idea to hav the specific commit hash in the url. As you say that way we could have a different sample sheet in master and dev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might make a patch to master then.
"--secondary=no", | ||
"-Y", | ||
"-R @RG\\\\tID:${meta.id}\\\\tSM:${meta.id}" | ||
].join(' ') } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this does not seem to relate the to sex check, so I would add a line to the changelog about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding RG was necessary for somalier, but I see your point as the change is broad. Will add to changelog!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be an idea to do replace ASSIGN with INFER or ESTIMATE? To me assign sounds like the pipeline is choosing a default value, but you are actively inferring or estimating the sex.
Sure, I could use INFER since that is the flag supplied to somalier relate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to add a sex check to the pipeline. A couple of notes
nextflow_schema.json
Outdated
"somalier_sites": { | ||
"type": "string", | ||
"description": "A VCF of known polymorphic sites", | ||
"format": "file-path" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to add some more constrains here. Like:
"pattern": "^\\S+\\.vcf(\\.gz)?$"
subworkflows/local/bam_assign_sex.nf
Outdated
family_id : meta.family_id, | ||
paternal_id : meta.paternal_id, | ||
maternal_id : meta.maternal_id, | ||
sex : meta2.infered_sex.toInteger(), // Use sex from somalier, only updated if sex is unknown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this always update sex with the inferred sex from somalier?
subworkflows/local/bam_assign_sex.nf
Outdated
.map { it -> | ||
new_meta = [ | ||
sample_id: it.sample_id, | ||
infered_sex: it.sex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
infered_sex: it.sex, | |
inferred_sex: it.sex, |
subworkflows/local/bam_assign_sex.nf
Outdated
family_id : meta.family_id, | ||
paternal_id : meta.paternal_id, | ||
maternal_id : meta.maternal_id, | ||
sex : meta2.infered_sex.toInteger(), // Use sex from somalier, only updated if sex is unknown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sex : meta2.infered_sex.toInteger(), // Use sex from somalier, only updated if sex is unknown | |
sex : meta2.inferred_sex.toInteger(), // Use sex from somalier, only updated if sex is unknown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).