feat(ancestral): make --genes optional when reconstructing translations, if omitted, translate all genes in annotation #1668

corneliusroemer · 2024-11-10T00:03:36Z

Description of proposed changes

The most common use case (in my experience) of augur ancestral's translation reconstruction is to translate all the genes in the genome annotation and then reconstruct all of them.

Currently, one needs to explicitly list out all the genes, for no real reason. This PR makes it so that if no --genes argument is provided, all genes are translated by default.

This is not a breaking change, as currently --genes is always required when an annotation is.

Checklist

Automated checks pass
Check if you need to add a changelog message
Check if you need to add tests
Check if you need to update docs

Until this PR, it was hard to use augur ancestral with a genome annotation gff3 from a Nextclade dataset for multiple reasons: - The GFF reader in augur looks for "gene" features only, while Nextclade 3 uses primarily CDS (gene only for backwards compatibility) - Augur's GFF reader doesn't support compound features - It takes the "gene"/"feature" name in the order "gene" > "gene_name" > "locus_tag", which differs from Nextclade's order This meant that users had to create a Genbank file, just for the purpose of running augur ancestral. That was doable but tedious. This PR solves the problem by adding a new GFF reader mode that is designed to be as compatible with Nextclade as possible. That means: - It looks for CDS features (for simplicity it ignores Nextclade's gene backwards compatibility) - It supports compound features - It takes the feature name in the exact same order as Nextclade The new mode is activated by passing the `--nextclade-compatibility` flag to augur ancestral. It does no harm to anyone who does not want to use it. It results in some code duplication, but I think it's totally worth it for the sake of usability. I have already started using it in some of my private builds and it works like a charm.

codecov · 2024-11-10T00:17:46Z

Codecov Report

Attention: Patch coverage is 78.72340% with 20 lines in your changes missing coverage. Please review.

Project coverage is 72.30%. Comparing base (29188d4) to head (014314e).
Report is 10 commits behind head on nextclade-compat.

Files with missing lines	Patch %	Lines
augur/utils.py	75.60%	9 Missing and 11 partials ⚠️

Additional details and impacted files

@@                 Coverage Diff                  @@
##           nextclade-compat    #1668      +/-   ##
====================================================
+ Coverage             72.26%   72.30%   +0.03%     
====================================================
  Files                    79       79              
  Lines                  8272     8318      +46     
  Branches               1691     1708      +17     
====================================================
+ Hits                   5978     6014      +36     
- Misses                 2009     2012       +3     
- Partials                285      292       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jameshadfield

Just skimmed through the code of these 2 commits, looks good. Instead of the first commit (bugfix) you could fix #1664 while it's still unmerged

…ns, if ommitted, translate all genes in annotation

corneliusroemer · 2024-11-10T19:38:27Z

Just skimmed through the code of these 2 commits, looks good. Instead of the first commit (bugfix) you could fix #1664 while it's still unmerged

Done, thanks!

I won't merge this yet as I think it's best to keep it chunked. If we don't do the nextclade mode, I might rebase later but for now I care most about the nextclade compatibility mode stuff :)

corneliusroemer added 6 commits November 9, 2024 03:33

fix test

71fb4a0

Propagate valid types

9b3e2ed

Add cram test for Nextclade compat mode

61af409

Fix test, rename argument to --use-nextclade-gff-parsing

6839588

Fix mypy errors by reducing typing information :/

b4e99a1

corneliusroemer mentioned this pull request Nov 10, 2024

feat(ancestral): Add optional Nextclade GFF3 compatibility mode #1664

Open

4 tasks

jameshadfield approved these changes Nov 10, 2024

View reviewed changes

corneliusroemer added 2 commits November 10, 2024 20:35

Update augur/utils.py

ec82215

feat(ancestral): make --genes optional when reconstructing translatio…

014314e

…ns, if ommitted, translate all genes in annotation

corneliusroemer force-pushed the all-genes branch from 8311506 to 014314e Compare November 10, 2024 19:38

corneliusroemer force-pushed the nextclade-compat branch from ec82215 to 566150e Compare November 10, 2024 19:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ancestral): make --genes optional when reconstructing translations, if omitted, translate all genes in annotation #1668

feat(ancestral): make --genes optional when reconstructing translations, if omitted, translate all genes in annotation #1668

corneliusroemer commented Nov 10, 2024

codecov bot commented Nov 10, 2024 •

edited

Loading

jameshadfield left a comment

corneliusroemer commented Nov 10, 2024 •

edited

Loading

feat(ancestral): make --genes optional when reconstructing translations, if omitted, translate all genes in annotation #1668

Are you sure you want to change the base?

feat(ancestral): make --genes optional when reconstructing translations, if omitted, translate all genes in annotation #1668

Conversation

corneliusroemer commented Nov 10, 2024

Description of proposed changes

Checklist

codecov bot commented Nov 10, 2024 • edited Loading

Codecov Report

jameshadfield left a comment

Choose a reason for hiding this comment

corneliusroemer commented Nov 10, 2024 • edited Loading

codecov bot commented Nov 10, 2024 •

edited

Loading

corneliusroemer commented Nov 10, 2024 •

edited

Loading