Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTF comparison script #151

Merged
merged 22 commits into from
Dec 18, 2024
Merged

GTF comparison script #151

merged 22 commits into from
Dec 18, 2024

Conversation

khajoue2
Copy link
Collaborator

@khajoue2 khajoue2 commented Dec 6, 2024

Add GTF comparison script for annotation analysis

Description

Added a new Python script that performs comprehensive comparison analysis between two GTF (Gene Transfer Format) files. This tool helps identify structural and content differences between GTF annotations, which is particularly useful when validating changes in annotation pipelines or comparing different versions of genome annotations.

Features

  • Detailed comparison of all GTF fields (columns 1-8) with sample differences
  • Attribute-level comparison (column 9) with statistics on differences
  • Gene-level analysis including:
    • Total gene count comparison
    • Identification of unique genes in each file
    • Special analysis of mitochondrial (MT-) genes
  • Generates three detailed report files:
    • *_structural_diff.txt: Analysis of basic GTF structure
    • *_attribute_diff.txt: Detailed attribute differences
    • *_gene_diff.txt: Gene-level comparisons and MT gene analysis

Usage

python3 compare_gtfs.py old.gtf new.gtf --output-prefix comparison_result

Here's a summary of the testing implementation for the GTF comparison script:

Added automated testing framework for GTF file comparison:

  1. Created test infrastructure:

    • create_test_gtfs.py: Generates test GTF files with known differences
    • test_gtf_comparison.py: Unit tests to verify comparison functionality
    • GitHub Actions workflow in .github/workflows/gtf_tests.yml
  2. Test Coverage:

    • Validates structural differences in GTF fields
    • Checks attribute differences in the 9th field
    • Verifies gene-level comparisons
    • Tests mitochondrial gene handling
    • Confirms proper output file generation
  3. CI/CD Integration:

    • Tests automatically run on pushes/PRs affecting build-indices directory
    • Test results uploaded as artifacts in GitHub Actions
    • Working directory configured for 3rd-party-tools/build-indices path
  4. How to Run Tests Locally:

python -m unittest test_gtf_comparison.py -v

Files Added:

  • test_gtf_comparison.py
  • create_test_gtfs.py
  • .github/workflows/gtf_tests.yml

Path: /warp-tools/3rd-party-tools/build-indices/

Copy link
Member

@jessicaway jessicaway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks @khajoue2!

@khajoue2 khajoue2 merged commit 47693d9 into develop Dec 18, 2024
22 checks passed
@khajoue2 khajoue2 deleted the fk-compare-gtfs branch December 18, 2024 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants