Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raredisease add GC and AT dropout quality check #3838

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions cg/constants/nf_analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ class NfTowerStatus(StrEnum):
"MEDIAN_TARGET_COVERAGE": {"norm": "gt", "threshold": 25},
"PCT_TARGET_BASES_10X": {"norm": "gt", "threshold": 0.95},
"PCT_EXC_ADAPTER": {"norm": "lt", "threshold": 0.0005},
"AT_DROPOUT": {"norm": "lt", "threshold": 10},
"GC_DROPOUT": {"norm": "lt", "threshold": 10},
RAREDISEASE_PREDICTED_SEX_METRIC: {"norm": "eq", "threshold": None},
"gender": {"norm": "eq", "threshold": None},
}
Expand Down
11 changes: 11 additions & 0 deletions cg/meta/workflow/raredisease.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
RAREDISEASE_PARENT_PEDDY_METRIC_CONDITION,
)
from cg.constants.scout import RAREDISEASE_CASE_TAGS, ScoutExportFileName
from cg.constants.sequencing import SeqLibraryPrepCategory
from cg.constants.subject import PlinkPhenotypeStatus, PlinkSex
from cg.constants.tb import AnalysisType
from cg.meta.workflow.nf_analysis import NfAnalysisAPI
Expand Down Expand Up @@ -172,6 +173,7 @@ def get_workflow_metrics(self, sample_id: str) -> dict:
if "-" not in sample_id:
metric_conditions: dict[str, dict[str, Any]] = RAREDISEASE_METRIC_CONDITIONS.copy()
self.set_order_sex_for_sample(sample, metric_conditions)
self.set_dropout_cutoff_by_analysis_type(sample, metric_conditions)
else:
metric_conditions = RAREDISEASE_PARENT_PEDDY_METRIC_CONDITION.copy()
return metric_conditions
Expand Down Expand Up @@ -230,6 +232,15 @@ def set_order_sex_for_sample(sample: Sample, metric_conditions: dict) -> None:
metric_conditions["predicted_sex_sex_check"]["threshold"] = sample.sex
metric_conditions["gender"]["threshold"] = sample.sex

@staticmethod
def set_dropout_cutoff_by_analysis_type(sample: Sample, metric_conditions: dict) -> None:
if (
sample.application_version.application.analysis_type
== SeqLibraryPrepCategory.WHOLE_GENOME_SEQUENCING
):
metric_conditions["AT_DROPOUT"]["threshold"] = 5
metric_conditions["GC_DROPOUT"]["threshold"] = 5
Comment on lines +235 to +242
Copy link
Contributor

@ChrOertlin ChrOertlin Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic sets the treshhold for the AT and GC dropout to 5 in case of WGS. I think it would be better to pre-define a full set of Rare-disease WGS metrics and fetch the whole set of metrics when required.

I fear this logic here will be lost (especially if undocumented) later on. Whereas having two sets of QC threshold collections would be clearer. Plus, in case we would require other metrics with different tresholds we can modify the collection, rather than writing more functions with if statements.

My suggestion:

  1. Create RAREDISEASE_WGS_METRICS_CONDITIONS
  2. Create a function to `fetch_raredisease_metrics_conditions(prep_category)
  3. Use the set of fetched metrics


def get_sample_coverage_file_path(self, bundle_name: str, sample_id: str) -> str | None:
"""Return the Raredisease d4 coverage file path."""
coverage_file_tags: list[str] = RAREDISEASE_COVERAGE_FILE_TAGS + [sample_id]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -479,14 +479,18 @@ metrics:
name: PCT_TARGET_BASES_100000X
step: multiqc
value: 0.0
- condition: null
- condition:
norm: lt
threshold: 5.0
header: null
id: ADM1
input: multiqc_data.json
name: AT_DROPOUT
step: multiqc
value: 0.061814
- condition: null
- condition:
norm: lt
threshold: 5.0
header: null
id: ADM1
input: multiqc_data.json
Expand Down Expand Up @@ -1599,14 +1603,18 @@ metrics:
name: PCT_TARGET_BASES_100000X
step: multiqc
value: 0.0
- condition: null
- condition:
norm: lt
threshold: 5.0
header: null
id: ADM2
input: multiqc_data.json
name: AT_DROPOUT
step: multiqc
value: 0.061814
- condition: null
- condition:
norm: lt
threshold: 5.0
header: null
id: ADM2
input: multiqc_data.json
Expand Down
Loading