Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use project name instead of multisample #264

Merged
merged 1 commit into from
Aug 9, 2024

Conversation

fellen31
Copy link
Collaborator

@fellen31 fellen31 commented Jul 24, 2024

This PR adds a project column to the samplesheet, that multisample output files are grouped on.

This would have the benefit of being able to run a case (at CG) with one family, where family_id is equal to project, e.g:

project,sample,file,family_id,paternal_id,maternal_id,sex,phenotype
slowsaluki,HG002,/path/to/HG002.bam,slowsaluki,HG003,HG004,0,2
slowsaluki,HG003,/path/to/HG003.bam,slowsaluki,0,0,1,1
slowsaluki,HG004,/path/to/HG004.bam,slowsaluki,0,0,1,1

which would output one VCF file containing one family, named slowsaluki_*.vcf.gz, together with single sample VCFs. project would be imitating case in raredisease, but named project to avoid confusion.

At the same time you would be able to run a bigger project, where multiple families would be included within the same project, e.g:

project,sample,file,family_id,paternal_id,maternal_id,sex,phenotype
bigproject,HG002,/path/to/HG002.bam,slowsaluki,HG003,HG004,0,2
bigproject,HG003,/path/to/HG003.bam,slowsaluki,0,0,1,1
bigproject,HG004,/path/to/HG004.bam,slowsaluki,0,0,1,1
bigproject,HG005,/path/to/HG005.bam,fastsnail,0,0,1,1
bigproject,HG006,/path/to/HG006.bam,lazysnake,0,0,1,1

which would output one VCF file containing multiple families, named bigproject_*.vcf.gz, together with single sample VCFs.

After discussing with Ram, an alternative solution would be to add a --project_mode parameter, that could change the grouping of the output from project to family_id, having project as an optional column in the samplesheet. But in my opinion that might be more complicated than the proposed solution. A third option would be to just add a --multisample_output_name parameter that decides the output name, but I think we both agreed that this is better done in the samplesheet.


Additionally, this PR also moves the PED creation into a process (like in raredisease) and closes #158, removes the --extra_snfs parameter, and #48 would be closed in favour of this.


PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link

github-actions bot commented Jul 24, 2024

nf-core lint overall result: Passed ✅

Posted for pipeline commit 9b5d763

+| ✅ 160 tests passed       |+
#| ❔  17 tests were ignored |#

❔ Tests ignored:

  • files_exist - File is ignored: CODE_OF_CONDUCT.md
  • files_exist - File is ignored: assets/nf-core-nallo_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-nallo_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-nallo_logo_dark.png
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • files_exist - File is ignored: conf/modules.config
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • files_unchanged - File ignored due to lint config: CODE_OF_CONDUCT.md
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File ignored due to lint config: assets/nf-core-nallo_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-nallo_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-nallo_logo_dark.png
  • modules_config - modules_config

✅ Tests passed:

Run details

  • nf-core/tools version 2.14.1
  • Run at 2024-08-09 13:45:43

@fellen31 fellen31 force-pushed the project_id2 branch 3 times, most recently from 0bfe39b to f6ffa88 Compare July 24, 2024 09:32
@fellen31 fellen31 marked this pull request as ready for review July 24, 2024 09:47
@fellen31 fellen31 requested a review from a team as a code owner July 24, 2024 09:47
Copy link
Collaborator

@jemten jemten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice to replace the hardcoded 'multisample' with the dynamic project_id! What would happen in the case where project_id is different from family_id? I assume that genmod needs to be run per family.

@@ -33,10 +33,10 @@ process {

withName: '.*:STRUCTURAL_VARIANT_CALLING:SNIFFLES_MULTISAMPLE' {

ext.prefix = 'multisample_sniffles'
ext.prefix = { "${meta.id}_sniffles" }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +200 to +204
// Check that there's no more than one project
// TODO: Try to do this in nf-schema
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@fellen31
Copy link
Collaborator Author

fellen31 commented Aug 5, 2024

Very nice to replace the hardcoded 'multisample' with the dynamic project_id! What would happen in the case where project_id is different from family_id? I assume that genmod needs to be run per family.

Genmod seems to work fine with multiple families(Clinical-Genomics/genmod@e1c3981) from what I can see, as long as there is at least one affected individual. The main issue seems to be memory usage when VCFs becomes too big. If this is an issue, we could think about splitting the VCF and ranking per family, if there are multiple families per project (#276).

@fellen31 fellen31 force-pushed the project_id2 branch 3 times, most recently from 9527cb0 to d5758f0 Compare August 9, 2024 12:34
@fellen31 fellen31 merged commit 7f064db into genomic-medicine-sweden:dev Aug 9, 2024
14 checks passed
@fellen31 fellen31 deleted the project_id2 branch August 9, 2024 14:10
fellen31 added a commit to fellen31/skierfe that referenced this pull request Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Move PED-file generation to process
2 participants