Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ancestral): make --genes optional when reconstructing translations, if omitted, translate all genes in annotation #1668

Draft
wants to merge 8 commits into
base: nextclade-compat
Choose a base branch
from

Conversation

corneliusroemer
Copy link
Member

Description of proposed changes

The most common use case (in my experience) of augur ancestral's translation reconstruction is to translate all the genes in the genome annotation and then reconstruct all of them.

Currently, one needs to explicitly list out all the genes, for no real reason. This PR makes it so that if no --genes argument is provided, all genes are translated by default.

This is not a breaking change, as currently --genes is always required when an annotation is.

Checklist

  • Automated checks pass
  • Check if you need to add a changelog message
  • Check if you need to add tests
  • Check if you need to update docs

Until this PR, it was hard to use augur ancestral with a genome annotation gff3 from a Nextclade dataset for multiple reasons:
- The GFF reader in augur looks for "gene" features only, while Nextclade 3 uses primarily CDS (gene only for backwards compatibility)
- Augur's GFF reader doesn't support compound features
- It takes the "gene"/"feature" name in the order "gene" > "gene_name" > "locus_tag", which differs from Nextclade's order

This meant that users had to create a Genbank file, just for the purpose of running augur ancestral.
That was doable but tedious.

This PR solves the problem by adding a new GFF reader mode that is designed to be as compatible with Nextclade as possible. That means:
- It looks for CDS features (for simplicity it ignores Nextclade's gene backwards compatibility)
- It supports compound features
- It takes the feature name in the exact same order as Nextclade

The new mode is activated by passing the `--nextclade-compatibility` flag to augur ancestral.

It does no harm to anyone who does not want to use it.

It results in some code duplication, but I think it's totally worth it for the sake of usability.
I have already started using it in some of my private builds and it works like a charm.
Copy link

codecov bot commented Nov 10, 2024

Codecov Report

Attention: Patch coverage is 78.72340% with 20 lines in your changes missing coverage. Please review.

Project coverage is 72.30%. Comparing base (29188d4) to head (014314e).
Report is 10 commits behind head on nextclade-compat.

Files with missing lines Patch % Lines
augur/utils.py 75.60% 9 Missing and 11 partials ⚠️
Additional details and impacted files
@@                 Coverage Diff                  @@
##           nextclade-compat    #1668      +/-   ##
====================================================
+ Coverage             72.26%   72.30%   +0.03%     
====================================================
  Files                    79       79              
  Lines                  8272     8318      +46     
  Branches               1691     1708      +17     
====================================================
+ Hits                   5978     6014      +36     
- Misses                 2009     2012       +3     
- Partials                285      292       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just skimmed through the code of these 2 commits, looks good. Instead of the first commit (bugfix) you could fix #1664 while it's still unmerged

@corneliusroemer
Copy link
Member Author

corneliusroemer commented Nov 10, 2024

Just skimmed through the code of these 2 commits, looks good. Instead of the first commit (bugfix) you could fix #1664 while it's still unmerged

Done, thanks!

I won't merge this yet as I think it's best to keep it chunked. If we don't do the nextclade mode, I might rebase later but for now I care most about the nextclade compatibility mode stuff :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants