Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about out_of_frame result in report.tsv #337

Open
btesson-lysarc opened this issue Dec 20, 2024 · 7 comments
Open

Question about out_of_frame result in report.tsv #337

btesson-lysarc opened this issue Dec 20, 2024 · 7 comments

Comments

@btesson-lysarc
Copy link

Hello and thanks for the great software tool.
I am looking at some bulk RNAseq data that I have analyzed with TRUST4 and I am puzzled by some records that are tagged as out_of_frame in the report.tsv file, even though if I go back to the corresponding contig sequence in the final.out file and analyze it with IgBlast I get an inframe productive result.
Here is an example:

#count frequency CDR3nt CDR3aa V D J C cid cid_full_length
2 2193 0.2823465 TGTGCGAAACTTTACCAGTCCGCTCCCCTATTTAGACTATTGG out_of_frame IGHV3-33*03 IGHD3-9*01 IGHJ4*02 . assemble3139 0
The corresponding sequence in final.out is :
>assemble3139 IGHV3-23
ATACTCAAGAACTCATTGTTTCTGCAAATGAACAGCCTGAGACCCGAGGACACGGCTCTTTATTTCTGTGTGCGAAACTTTACCAGTCCGCTCCCCTATTTAGACTATTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAGGGAGTGCATCCGCCAGAGGT
6 2 12 0 0 0 20 20 0 20 20 0 0 0 20 0 0 0 0 0 0 0 0 0 0 20 20 20 0 0 20 20 0 20 0 0 0 0 0 20 0 20 0 0 0 0 20 10 0 20 0 20 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 20 20 20 0 0 0 0 20 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 20 0 0 0 20 0 20 0 0 20 0 0 0 0 0 0 0 0 20 0 0 0 20 20 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 20 0 0 0 20 0 0 0 0 20 0 0 0 0 0 0 14 2 14 0 0 0

And when I run this sequence in IgBlast I do see a productive VDJ. Am I missing something about how TRUST4 works?

@mourisl
Copy link
Collaborator

mourisl commented Dec 20, 2024

Thanks for identifying this issue! I think this is the case where TRUST4 annotates a wrong V gene for the contig, and then derives a wrong CDR3 start site. I'll see how to fix for this one.

@mourisl
Copy link
Collaborator

mourisl commented Dec 21, 2024

Thank you for sharing the sequence! I think I've identified the reason. I have uploaded the fix to the "dev" branch on the repository. Could you please checkout that branch and give it a try? This branch also fixes other issues in TRUST4's V gene annotation.

@btesson-lysarc
Copy link
Author

Thank you very much for replying so quickly and proposing a fix. I have just tried rerunning that same sample with the latest dev version. I see that more reads now align to the main IGH clone which is annotated as productive, however I still see some consensus assemblies that are tagged as out of frame, even though IGBlast say they are productive (and maps them to the same VDJ as the main clone).
Here is an example:

#count frequency CDR3nt CDR3aa V D J C cid cid_full_length
2 1291 0.1769702 TGTGTGCGAAACTTTACCAGTCCGCTCCCCTATTTAGACTATTGG CVRNFTSPLPYLDYW IGHV3-48*03 IGHD3-9*01 IGHJ4*02 . assemble397 0
3 1272 0.1744215 TGTGCGAAACTTTACCAGTCCGCTCCCCTATTTAGACTATTGG out_of_frame IGHV3-33*03 IGHD3-9*01 IGHJ4*02 IGHM assemble0 0
4 1026 0.1406479 TGTGCGAAACTTTACCAGTCCGCTCCCCTATTTAGACTATTGG out_of_frame IGHV3-33*03 IGHD3-9*01 IGHJ4*02 . assemble525 0

And here is the assemble0 sequence from final.out:

x
>assemble0 IGHV3-23+IGHV3-23+IGHM
AGTCTATTAACTTATGGTTGTGGAAGTTATACATTATACGCCGACTCAGTGAAGGGCCGATTCACCGTCTCCAGAGACAACGCCAAGAACTCATTGTTTCTGCAAATGAACAGCCTGAGACCCGAGGACACGGCTCTTTATTTCTGTGTGCGAAACTTTACCAGTCCGCTCCCCTATTTAGACTATTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAGGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCGTCTCCTGTGAGAATTCCCCGTCGGATACGAGCAGCGTGGCCGTTGGCTGCCTCGCACAGGACTTCCTTCCCGACTCCATCACTTTCTCCTGGAAATACAAGAACAACTCTGACATCAGCAGCACCCGGGGCTTCCCATCAGTCCTGAGAGGGGGCAAGTACGCAGCCACCTCACAGGTGCTGCTGCCTTCCAAGGACGTCATGCAGGACACAGACACATT

Surprisingly, even though the CDR3aa sequences seem identical, the V-D-J gene assignments are different for assemble397 and assemble0 in IGBlast also.
Thank you very much again, and happy holidays if you are celebrating.

@mourisl
Copy link
Collaborator

mourisl commented Dec 23, 2024

Thank you again for providing the sequence! These are valuable for debugging. I think this sequence has a very high SHM rate, so TRUST4 has some trouble in V gene annotation. I'll look into this issue. Happy holidays!

@mourisl
Copy link
Collaborator

mourisl commented Dec 28, 2024

I have made some updates to the dev branch so the "assemble0" should get a different V gene annotation and a in-frame CDR3 sequence. Could you update this branch and give it a try? Could you please also share the sequence for assemble525, which may have a different reason of getting a wrong V gene annotation.

@btesson-lysarc
Copy link
Author

Hello,
First of all Happy New Year and thanks for looking into this it is very much appreciated.
Here is the sequence for the assemble525 from the previous run you asked about:

x
>assemble525 IGHV3-23
GTCGTGTCTCATTGTTTCTGCAAATGAACAGCCTGAGACCCGAGGACACGGCTCTTTATTTCTGTGTGCGAAACTTTACCAGTCCGCTCCCCTATTTAGACTATTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAGGGAGTGCATCCGCCAGACA

I just re-ran the sample using the latest dev version from trust4 and unfortunately I still get an out of frame assemble0, here are the top IGH clones in the report.tsv with the new run of the same sample using the latest dev version of trust4:

#count frequency CDR3nt CDR3aa V D J C cid cid_full_length
2 1357 0.1860526 TGTGTGCGAAACTTTACCAGTCCGCTCCCCTATTTAGACTATTGG CVRNFTSPLPYLDYW IGHV3-48*03 IGHD3-9*01 IGHJ4*02 . assemble397 0
3 1207 0.1655154 TGTGCGAAACTTTACCAGTCCGCTCCCCTATTTAGACTATTGG out_of_frame IGHV3-33*03 IGHD3-9*01 IGHJ4*02 IGHM assemble0 0
4 875 0.1199193 TGTGCGAAACTTTACCAGTCCGCTCCCCTATTTAGACTATTGG out_of_frame IGHV3-33*03 IGHD3-9*01 IGHJ4*02 . assemble525 0
5 651 0.0893246 TGTGTGCGAAACTTTACCAGTCCGCTCCCCTATTTAGACTATTGG CVRNFTSPLPYLDYW IGHV3-48*03 IGHD3-9*01 IGHJ4*02 IGHM assemble101 0
6 281 0.0385408 TGTGTGCGAAACTTTACCAGTCCGCTCCCCTATTTAGACTATTGG CVRNFTSPLPYLDYW IGHV3-69-1*02 IGHD3-9*01 IGHJ4*02 . assemble229 0

Let me know if there is anything I can do to be more helpful on this issue.

@mourisl
Copy link
Collaborator

mourisl commented Jan 4, 2025

Thanks for sharing the assemble525's sequence. It does have some interesting properties, like too many SHM in the CDR3 region of the V gene. I have pushed a fix to the "dev" branch.

For assemble0, I got the V gene annotation IGHV3-21 by the current implementation. Since the abundance estimation changes comparing with the last run, I guess the underlying contig sequence is changed somehow. Could you please share the current sequence of assemble0? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants