Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

representation of a nuc conversion do not point a conversion on the genome #1107

Open
huzuner opened this issue May 7, 2024 · 4 comments
Open
Labels
bug Something isn't working

Comments

@huzuner
Copy link

huzuner commented May 7, 2024

Hello,

For our research, I have been using resources from Nextstrain and I encountered an ambiguity that I wanted to ask about.

I am currently working on the tabular file for nuc conversions belonging to Nextstrain clades (https://github.com/nextstrain/ncov/blob/master/defaults/clades.tsv). I have been assuming that the clades.tsv represents nucleotide conversions, so they should inform about a nucleotide change compared to the reference genome.
However, I noticed that one of the conversions listed there, 22F nuc 15461 A, already points to A on the reference genome (NC_045512v2). Does this have an intended purpose or do I misunderstand the concept?

Thank you in advance for your response.

@huzuner huzuner added the bug Something isn't working label May 7, 2024
@jameshadfield
Copy link
Member

Yes it looks like there are no (or very few) mutations observed at this position. It may have been a typo. Using all 4 defining positions we can see how they identify 22F.

Note that these definitions are the ones we use for augur clades to define clades on a tree. They're not intended to be used for individual sequence classification. We recommend using nextclade for classification of sequences.

@huzuner
Copy link
Author

huzuner commented Jul 9, 2024

Thanks for your answer!

What is the difference between augur clades and nextstrain clades? Are they not same or similar?

In my case, I need either full fasta sequences or nucleotide changes of Nextstrain/Nextclade clades so that I can create
fasta sequences myself. I did not find any of this in 'Nextclade datasets'. The only resource that is relevant is 'tree.json' but we identified some inconsistencies for some clades when compared with covariants.org.
That's how I ended up with this 'clades.tsv', assuming that it provides the actual nucleotide differences belonging to clades.

@jameshadfield
Copy link
Member

cc @corneliusroemer - I think you have a fasta of representative sequences for each lineage?

@huzuner
Copy link
Author

huzuner commented Jul 12, 2024

cc @corneliusroemer - I think you have a fasta of representative sequences for each lineage?

Would be great to hear if this exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants