Skip to content

Conversation

@iamh2o
Copy link
Contributor

@iamh2o iamh2o commented Sep 9, 2025

Summary

  • add Somalier/Picard/Conpair/Peddy relatedness QC workflow
  • include environment specs and example configuration
  • generate consolidated relatedness report script

Testing

  • python -m py_compile workflow/scripts/relatedness_report.py
  • pytest -q

https://chatgpt.com/codex/tasks/task_e_68ad9dc07cfc83318a7219f999bbe46b

@Copilot Copilot AI review requested due to automatic review settings September 9, 2025 04:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a comprehensive relatedness test workflow that integrates Somalier, Picard, Conpair, and Peddy tools for quality control analysis. The workflow can process both BAM and VCF files to assess sample relationships and generate consolidated reports.

  • Implements a multi-tool relatedness QC pipeline with Somalier, Picard CrosscheckFingerprints, Conpair, and optional Peddy
  • Creates consolidated reporting script that evaluates expected relationships against configurable thresholds
  • Provides environment specifications and example configuration for the complete workflow

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

File Description
workflow/scripts/relatedness_report.py Main reporting script that processes outputs from all tools and generates TSV/HTML reports
workflow/rules/relatedness_test_day.smk Snakemake workflow rules defining the complete relatedness testing pipeline
workflow/envs/*.yaml Conda environment specifications for each tool (Somalier, Picard, Conpair, Peddy, reporting)
config/relatedness.yaml Example configuration file showing required parameters and sample definitions

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +207 to +219
wildcards:
t=[p[0] for p in tn_pairs()],
n=[p[1] for p in tn_pairs()]

rule conpair_compare_all:
input:
expand(
"results/conpair/{t}__{n}/concordance.tsv",
t=[p[0] for p in tn_pairs()],
n=[p[1] for p in tn_pairs()]
),
expand(
"results/conpair/{t}__{n}/summary.txt",
t=[p[0] for p in tn_pairs()],
n=[p[1] for p in tn_pairs()]
)
use rule conpair_parse as conpair_parse_all with:
wildcards:
t=[p[0] for p in tn_pairs()],
n=[p[1] for p in tn_pairs()]

use rule conpair_compare as conpair_compare_all with:
wildcards:
t=[p[0] for p in tn_pairs()],
n=[p[1] for p in tn_pairs()]
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use rule ... with: syntax should use input: instead of wildcards: for expanding over multiple wildcard combinations. The current syntax is incorrect and will likely cause Snakemake execution errors.

Copilot uses AI. Check for mistakes.
except Exception:
# Some versions write CSV-like; be lenient
df = pd.read_csv(f)
df.columns = [c.lower() for c in df.columns]
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lowercasing all column names unconditionally may cause issues if the original column names are needed elsewhere. This could break column references if Conpair outputs use specific case-sensitive column names that are expected downstream.

Suggested change
df.columns = [c.lower() for c in df.columns]

Copilot uses AI. Check for mistakes.
Comment on lines +67 to +68
df["sample_a"] = pair[0]
df["sample_b"] = pair[1]
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code assumes the directory structure always contains exactly two elements when split on '__', but doesn't handle cases where the split results in fewer than 2 elements, which would cause an IndexError.

Suggested change
df["sample_a"] = pair[0]
df["sample_b"] = pair[1]
if len(pair) >= 2:
df["sample_a"] = pair[0]
df["sample_b"] = pair[1]
else:
# Fallback: assign the whole name to sample_a, empty string to sample_b
df["sample_a"] = pair[0]
df["sample_b"] = ""

Copilot uses AI. Check for mistakes.
elif rel == "tumor_normal":
if som_rel is not None and som_rel < THRESH["tn_relatedness"]:
status = "FAIL"; fail_reasons.append(f"Somalier.relatedness={som_rel:.3f} < {THRESH['tn_relatedness']}")
if pic is not None and pic["lod"] < 0 and pic["lod"] <= THRESH["picard_mismatch_lod"]:
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition pic[\"lod\"] < 0 and pic[\"lod\"] <= THRESH[\"picard_mismatch_lod\"] is redundant since THRESH["picard_mismatch_lod"] is -5.0. The first condition < 0 is always true when the second condition <= -5.0 is true. Simplify to just pic[\"lod\"] <= THRESH[\"picard_mismatch_lod\"].

Suggested change
if pic is not None and pic["lod"] < 0 and pic["lod"] <= THRESH["picard_mismatch_lod"]:
if pic is not None and pic["lod"] <= THRESH["picard_mismatch_lod"]:

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant