-
Notifications
You must be signed in to change notification settings - Fork 1
Add relatedness test workflow #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a comprehensive relatedness test workflow that integrates Somalier, Picard, Conpair, and Peddy tools for quality control analysis. The workflow can process both BAM and VCF files to assess sample relationships and generate consolidated reports.
- Implements a multi-tool relatedness QC pipeline with Somalier, Picard CrosscheckFingerprints, Conpair, and optional Peddy
- Creates consolidated reporting script that evaluates expected relationships against configurable thresholds
- Provides environment specifications and example configuration for the complete workflow
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| workflow/scripts/relatedness_report.py | Main reporting script that processes outputs from all tools and generates TSV/HTML reports |
| workflow/rules/relatedness_test_day.smk | Snakemake workflow rules defining the complete relatedness testing pipeline |
| workflow/envs/*.yaml | Conda environment specifications for each tool (Somalier, Picard, Conpair, Peddy, reporting) |
| config/relatedness.yaml | Example configuration file showing required parameters and sample definitions |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| wildcards: | ||
| t=[p[0] for p in tn_pairs()], | ||
| n=[p[1] for p in tn_pairs()] | ||
|
|
||
| rule conpair_compare_all: | ||
| input: | ||
| expand( | ||
| "results/conpair/{t}__{n}/concordance.tsv", | ||
| t=[p[0] for p in tn_pairs()], | ||
| n=[p[1] for p in tn_pairs()] | ||
| ), | ||
| expand( | ||
| "results/conpair/{t}__{n}/summary.txt", | ||
| t=[p[0] for p in tn_pairs()], | ||
| n=[p[1] for p in tn_pairs()] | ||
| ) | ||
| use rule conpair_parse as conpair_parse_all with: | ||
| wildcards: | ||
| t=[p[0] for p in tn_pairs()], | ||
| n=[p[1] for p in tn_pairs()] | ||
|
|
||
| use rule conpair_compare as conpair_compare_all with: | ||
| wildcards: | ||
| t=[p[0] for p in tn_pairs()], | ||
| n=[p[1] for p in tn_pairs()] |
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use rule ... with: syntax should use input: instead of wildcards: for expanding over multiple wildcard combinations. The current syntax is incorrect and will likely cause Snakemake execution errors.
| except Exception: | ||
| # Some versions write CSV-like; be lenient | ||
| df = pd.read_csv(f) | ||
| df.columns = [c.lower() for c in df.columns] |
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lowercasing all column names unconditionally may cause issues if the original column names are needed elsewhere. This could break column references if Conpair outputs use specific case-sensitive column names that are expected downstream.
| df.columns = [c.lower() for c in df.columns] |
| df["sample_a"] = pair[0] | ||
| df["sample_b"] = pair[1] |
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code assumes the directory structure always contains exactly two elements when split on '__', but doesn't handle cases where the split results in fewer than 2 elements, which would cause an IndexError.
| df["sample_a"] = pair[0] | |
| df["sample_b"] = pair[1] | |
| if len(pair) >= 2: | |
| df["sample_a"] = pair[0] | |
| df["sample_b"] = pair[1] | |
| else: | |
| # Fallback: assign the whole name to sample_a, empty string to sample_b | |
| df["sample_a"] = pair[0] | |
| df["sample_b"] = "" |
| elif rel == "tumor_normal": | ||
| if som_rel is not None and som_rel < THRESH["tn_relatedness"]: | ||
| status = "FAIL"; fail_reasons.append(f"Somalier.relatedness={som_rel:.3f} < {THRESH['tn_relatedness']}") | ||
| if pic is not None and pic["lod"] < 0 and pic["lod"] <= THRESH["picard_mismatch_lod"]: |
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition pic[\"lod\"] < 0 and pic[\"lod\"] <= THRESH[\"picard_mismatch_lod\"] is redundant since THRESH["picard_mismatch_lod"] is -5.0. The first condition < 0 is always true when the second condition <= -5.0 is true. Simplify to just pic[\"lod\"] <= THRESH[\"picard_mismatch_lod\"].
| if pic is not None and pic["lod"] < 0 and pic["lod"] <= THRESH["picard_mismatch_lod"]: | |
| if pic is not None and pic["lod"] <= THRESH["picard_mismatch_lod"]: |
Summary
Testing
python -m py_compile workflow/scripts/relatedness_report.pypytest -qhttps://chatgpt.com/codex/tasks/task_e_68ad9dc07cfc83318a7219f999bbe46b