-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use project name instead of multisample #264
Conversation
|
0bfe39b
to
f6ffa88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice to replace the hardcoded 'multisample' with the dynamic project_id! What would happen in the case where project_id is different from family_id? I assume that genmod needs to be run per family.
@@ -33,10 +33,10 @@ process { | |||
|
|||
withName: '.*:STRUCTURAL_VARIANT_CALLING:SNIFFLES_MULTISAMPLE' { | |||
|
|||
ext.prefix = 'multisample_sniffles' | |||
ext.prefix = { "${meta.id}_sniffles" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// Check that there's no more than one project | ||
// TODO: Try to do this in nf-schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Genmod seems to work fine with multiple families(Clinical-Genomics/genmod@e1c3981) from what I can see, as long as there is at least one affected individual. The main issue seems to be memory usage when VCFs becomes too big. If this is an issue, we could think about splitting the VCF and ranking per family, if there are multiple families per project (#276). |
9527cb0
to
d5758f0
Compare
This PR adds a project column to the samplesheet, that multisample output files are grouped on.
This would have the benefit of being able to run a case (at CG) with one family, where
family_id
is equal toproject
, e.g:which would output one VCF file containing one family, named
slowsaluki_*.vcf.gz
, together with single sample VCFs.project
would be imitating case in raredisease, but named project to avoid confusion.At the same time you would be able to run a bigger project, where multiple families would be included within the same project, e.g:
which would output one VCF file containing multiple families, named
bigproject_*.vcf.gz
, together with single sample VCFs.After discussing with Ram, an alternative solution would be to add a
--project_mode
parameter, that could change the grouping of the output from project to family_id, havingproject
as an optional column in the samplesheet. But in my opinion that might be more complicated than the proposed solution. A third option would be to just add a--multisample_output_name
parameter that decides the output name, but I think we both agreed that this is better done in the samplesheet.Additionally, this PR also moves the PED creation into a process (like in raredisease) and closes #158, removes the
--extra_snfs
parameter, and #48 would be closed in favour of this.PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).