-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ancestral): make --genes optional when reconstructing translations, if omitted, translate all genes in annotation #1668
base: nextclade-compat
Are you sure you want to change the base?
Conversation
Until this PR, it was hard to use augur ancestral with a genome annotation gff3 from a Nextclade dataset for multiple reasons: - The GFF reader in augur looks for "gene" features only, while Nextclade 3 uses primarily CDS (gene only for backwards compatibility) - Augur's GFF reader doesn't support compound features - It takes the "gene"/"feature" name in the order "gene" > "gene_name" > "locus_tag", which differs from Nextclade's order This meant that users had to create a Genbank file, just for the purpose of running augur ancestral. That was doable but tedious. This PR solves the problem by adding a new GFF reader mode that is designed to be as compatible with Nextclade as possible. That means: - It looks for CDS features (for simplicity it ignores Nextclade's gene backwards compatibility) - It supports compound features - It takes the feature name in the exact same order as Nextclade The new mode is activated by passing the `--nextclade-compatibility` flag to augur ancestral. It does no harm to anyone who does not want to use it. It results in some code duplication, but I think it's totally worth it for the sake of usability. I have already started using it in some of my private builds and it works like a charm.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## nextclade-compat #1668 +/- ##
====================================================
+ Coverage 72.26% 72.30% +0.03%
====================================================
Files 79 79
Lines 8272 8318 +46
Branches 1691 1708 +17
====================================================
+ Hits 5978 6014 +36
- Misses 2009 2012 +3
- Partials 285 292 +7 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just skimmed through the code of these 2 commits, looks good. Instead of the first commit (bugfix) you could fix #1664 while it's still unmerged
…ns, if ommitted, translate all genes in annotation
8311506
to
014314e
Compare
Done, thanks! I won't merge this yet as I think it's best to keep it chunked. If we don't do the nextclade mode, I might rebase later but for now I care most about the nextclade compatibility mode stuff :) |
ec82215
to
566150e
Compare
Description of proposed changes
The most common use case (in my experience) of augur ancestral's translation reconstruction is to translate all the genes in the genome annotation and then reconstruct all of them.
Currently, one needs to explicitly list out all the genes, for no real reason. This PR makes it so that if no
--genes
argument is provided, all genes are translated by default.This is not a breaking change, as currently
--genes
is always required when an annotation is.Checklist