Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when dealing with insertions and deletions in assemblies #324

Open
pirale opened this issue May 25, 2022 · 0 comments
Open

Bug when dealing with insertions and deletions in assemblies #324

pirale opened this issue May 25, 2022 · 0 comments

Comments

@pirale
Copy link

pirale commented May 25, 2022

Hi there,

Hope you're well.

I noticed there is a bug in ariba that occurs when insertions or deletions are multiples of three and do not start at the first position of a codon. The bug seems to originate from the way insertions and deletions are handled in the source file assembly_variants.py. When an insertion or deletion is a multiple of three, but does not start at the first position of a codon, it will affect that codon in the reference and also the following codon in the reference.

I noticed the bug with a single amino-acid deletion that was incorrectly marked as a truncation by ariba. You can see for yourself by downloading the read set SRR850776 from NCBI SRA (corresponding assembly NZ_KI973283.1) and searching for the gene in the inserted fasta.
>Enterobacter-NL68__wzy atgaatgataagagtttaaaaaataaccacttcaaaataagtgcgcatttagcgtttata tatttcctgcttacttcttctttattattgatttttttaacggaatcggcaagtgctaca ctttatggaactgtagaggatatttttgcggttttttgtgccattatattgtttggtgag atgatttacttctatatgcatagagtgaagtttatctcgttgcaattaatgtttgctttt gtattttctttaattataggtattccttctttttatttgtatttctttaaaaaagcttct gatggctttgaattgacttgtatatggggtatgttaataaatatcatactctatcttaca gctatcaaaaatgttcataggcaacaagcaaaaagtataaataatctatttaagattata ttttccattgttggtgtttgtcagttaattaaaattgttttttatctgaaatttatttta tcatcaggcttagggcatttagctatttatactgatagtgaagaattactttcaagtatt ccttttgctgtccgtgctattagtggcttttcttctataatggctttggcagtcttttat tataaatcatcgaaaaaatataagatgctagcatttattttgctcgcatctgaccttgtt attgggataagaaataaattcttttttgcttttatatgcattattattctctcgttatat tcaaatagaaagaaaataatagcaatattcgctagaatatccaaagtacactatttatta attggcttcgtcggtttttcaatgatttcatatcttcgtgaaggatatgaaatcaatttt attaattatcttggcgttgtacttgactctctgtcgtctacgcttgcaggtttacaagat ttatactatttgcccgatgaaaatggttgggcgttactaaacccccttacgatattatcg caagtgttgccgctcagtggttttggcttcataagcgatgcacaaattgctcatgaatat tcaacaattgtgcttggcagcgtgtctaatgggatagcgttgtcatcttctggtcttctt gaagcaagtataataagtttgcatttcaatttatttatttatcttgcctatctgttaatt atgatctcgataattcaaaaaggtttgaatagtaattatgttatttttaacttttttgcc ctggctatgatgactggtttcttctattctgttcgtggagaattaattttgccatttgct tatgttttaaaatcgtttccaataataataattgcaaatctattgactcaacaaaaaagt agaaattga
The mutation TAA1289. starts at the 3rd position of a codon and results in the deletion of a single amino acid (not affecting the amino-acid sequence further in this specific case). Ariba erroneously translates TAA to a stop codon instead.

Thank you for providing the software. Happy to answer any questions that you may have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant