Use project name instead of multisample #264

fellen31 · 2024-07-24T08:00:46Z

This PR adds a project column to the samplesheet, that multisample output files are grouped on.

This would have the benefit of being able to run a case (at CG) with one family, where family_id is equal to project, e.g:

project,sample,file,family_id,paternal_id,maternal_id,sex,phenotype
slowsaluki,HG002,/path/to/HG002.bam,slowsaluki,HG003,HG004,0,2
slowsaluki,HG003,/path/to/HG003.bam,slowsaluki,0,0,1,1
slowsaluki,HG004,/path/to/HG004.bam,slowsaluki,0,0,1,1

which would output one VCF file containing one family, named slowsaluki_*.vcf.gz, together with single sample VCFs. project would be imitating case in raredisease, but named project to avoid confusion.

At the same time you would be able to run a bigger project, where multiple families would be included within the same project, e.g:

project,sample,file,family_id,paternal_id,maternal_id,sex,phenotype
bigproject,HG002,/path/to/HG002.bam,slowsaluki,HG003,HG004,0,2
bigproject,HG003,/path/to/HG003.bam,slowsaluki,0,0,1,1
bigproject,HG004,/path/to/HG004.bam,slowsaluki,0,0,1,1
bigproject,HG005,/path/to/HG005.bam,fastsnail,0,0,1,1
bigproject,HG006,/path/to/HG006.bam,lazysnake,0,0,1,1

which would output one VCF file containing multiple families, named bigproject_*.vcf.gz, together with single sample VCFs.

After discussing with Ram, an alternative solution would be to add a --project_mode parameter, that could change the grouping of the output from project to family_id, having project as an optional column in the samplesheet. But in my opinion that might be more complicated than the proposed solution. A third option would be to just add a --multisample_output_name parameter that decides the output name, but I think we both agreed that this is better done in the samplesheet.

Additionally, this PR also moves the PED creation into a process (like in raredisease) and closes #158, removes the --extra_snfs parameter, and #48 would be closed in favour of this.

PR checklist

github-actions · 2024-07-24T08:02:52Z

`nf-core lint` overall result: Passed ✅

Posted for pipeline commit 9b5d763

+| ✅ 160 tests passed       |+
#| ❔  17 tests were ignored |#

❔ Tests ignored:

files_exist - File is ignored: CODE_OF_CONDUCT.md
files_exist - File is ignored: assets/nf-core-nallo_logo_light.png
files_exist - File is ignored: docs/images/nf-core-nallo_logo_light.png
files_exist - File is ignored: docs/images/nf-core-nallo_logo_dark.png
files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
files_exist - File is ignored: .github/workflows/awstest.yml
files_exist - File is ignored: .github/workflows/awsfulltest.yml
files_exist - File is ignored: conf/modules.config
nextflow_config - Config variable ignored: manifest.name
nextflow_config - Config variable ignored: manifest.homePage
files_unchanged - File ignored due to lint config: CODE_OF_CONDUCT.md
files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
files_unchanged - File ignored due to lint config: assets/nf-core-nallo_logo_light.png
files_unchanged - File ignored due to lint config: docs/images/nf-core-nallo_logo_light.png
files_unchanged - File ignored due to lint config: docs/images/nf-core-nallo_logo_dark.png
modules_config - modules_config

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-nallo_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowNallo.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 0.3.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.igenomes_ignore= true
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.max_cpus= 16
nextflow_config - Config default value correct: params.max_memory= 128.GB
nextflow_config - Config default value correct: params.max_time= 240.h
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.preset= revio
nextflow_config - Config default value correct: params.variant_caller= deepvariant
nextflow_config - Config default value correct: params.phaser= whatshap
nextflow_config - Config default value correct: params.hifiasm_mode= hifi-only
nextflow_config - Config default value correct: params.split_fastq= 0
nextflow_config - Config default value correct: params.parallel_snv= 13
nextflow_config - Config default value correct: params.vep_cache_version= 110
nextflow_config - Config default value correct: params.deepvariant_model_type= PACBIO
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/genomic-medicine-sweden/test-datasets/
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
readme - README Nextflow minimum version badge matched config. Badge: 23.04.0, Config: 23.04.0
readme - README Zenodo placeholder was replaced with DOI.
pipeline_todos - No TODO strings found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (482 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: ci_master.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
base_config - HIFIASM found in conf/base.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 2.14.1

Run details

nf-core/tools version 2.14.1
Run at 2024-08-09 13:45:43

jemten

Very nice to replace the hardcoded 'multisample' with the dynamic project_id! What would happen in the case where project_id is different from family_id? I assume that genmod needs to be run per family.

jemten · 2024-08-05T07:47:20Z

conf/modules/structural_variant_calling.config

@@ -33,10 +33,10 @@ process {

    withName: '.*:STRUCTURAL_VARIANT_CALLING:SNIFFLES_MULTISAMPLE' {

-        ext.prefix = 'multisample_sniffles'
+        ext.prefix = { "${meta.id}_sniffles" }


jemten · 2024-08-05T07:51:19Z

subworkflows/local/utils_nfcore_nallo_pipeline/main.nf

+        // Check that there's no more than one project
+        // TODO: Try to do this in nf-schema


fellen31 · 2024-08-05T09:21:29Z

Very nice to replace the hardcoded 'multisample' with the dynamic project_id! What would happen in the case where project_id is different from family_id? I assume that genmod needs to be run per family.

Genmod seems to work fine with multiple families(Clinical-Genomics/genmod@e1c3981) from what I can see, as long as there is at least one affected individual. The main issue seems to be memory usage when VCFs becomes too big. If this is an issue, we could think about splitting the VCF and ranking per family, if there are multiple families per project (#276).

fellen31 force-pushed the project_id2 branch from 0fde800 to 496ded7 Compare July 24, 2024 08:02

fellen31 force-pushed the project_id2 branch 3 times, most recently from 0bfe39b to f6ffa88 Compare July 24, 2024 09:32

fellen31 marked this pull request as ready for review July 24, 2024 09:47

fellen31 requested a review from a team as a code owner July 24, 2024 09:47

jemten approved these changes Aug 5, 2024

View reviewed changes

fellen31 force-pushed the project_id2 branch 3 times, most recently from 9527cb0 to d5758f0 Compare August 9, 2024 12:34

Use project name instead of multisample

9b5d763

fellen31 force-pushed the project_id2 branch from d5758f0 to 9b5d763 Compare August 9, 2024 13:44

fellen31 merged commit 7f064db into genomic-medicine-sweden:dev Aug 9, 2024
14 checks passed

fellen31 deleted the project_id2 branch August 9, 2024 14:10

fellen31 added a commit to fellen31/skierfe that referenced this pull request Aug 9, 2024

Use project name instead of multisample (genomic-medicine-sweden#264)

709dc45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use project name instead of multisample #264

Use project name instead of multisample #264

fellen31 commented Jul 24, 2024 •

edited

Loading

github-actions bot commented Jul 24, 2024 •

edited

Loading

❔ Tests ignored:

✅ Tests passed:

Run details

jemten left a comment

jemten Aug 5, 2024

jemten Aug 5, 2024

fellen31 commented Aug 5, 2024

		// Check that there's no more than one project
		// TODO: Try to do this in nf-schema

Use project name instead of multisample #264

Use project name instead of multisample #264

Conversation

fellen31 commented Jul 24, 2024 • edited Loading

PR checklist

github-actions bot commented Jul 24, 2024 • edited Loading

nf-core lint overall result: Passed ✅

❔ Tests ignored:

✅ Tests passed:

Run details

jemten left a comment

Choose a reason for hiding this comment

jemten Aug 5, 2024

Choose a reason for hiding this comment

jemten Aug 5, 2024

Choose a reason for hiding this comment

fellen31 commented Aug 5, 2024

fellen31 commented Jul 24, 2024 •

edited

Loading

github-actions bot commented Jul 24, 2024 •

edited

Loading

`nf-core lint` overall result: Passed ✅