Context-Aware Amino Acid Embedding Advances Analysis of TCR-Epitope Interactions

catELMo is a bi-directional amino acid embedding model that learns contextualized amino acid representations, treating an amino acid as a word and a sequence as a sentence. It learns patterns of amino acid sequences with its self-supervision signal, by predicting each the next amino acid token given its previous tokens. It has been trained on 4,173,895 TCR $\beta$ CDR3 sequences (52 million of amino acid tokens) from ImmunoSEQ. catELMo yields a real-valued representation vector for a sequence of amino acids, which can be used as input features of various downstream tasks. This is the official implementation of catELMo.

Publication

Context-Aware Amino Acid Embedding Advances Analysis of TCR-Epitope Interactions
Pengfei Zhang^1,2, Michael Cai^1,2, Seojin Bang², Heewook Lee^1,2
¹School of Computing and Augmented Intelligence, Arizona State University, ²Biodesign Institute, Arizona State University
Published in: eLife, 2023.

Paper | Code | Poster | Slides | Presentation (YouTube)

Dependencies

Linux
Python 3.6.13
Keras 2.6.0
TensorFlow 2.6.0

Steps to train a Binding Affinity Prediction model for TCR-epitope pairs.

1. Clone the repository

git clone https://github.com/Lee-CBG/catELMo
cd catELMo/
conda create --name bap python=3.6.13
pip install pandas==1.1.5 tensorflow==2.6.0 keras==2.6.0 scikit-learn==0.24.2 tqdm
source activate bap

2. Prepare TCR-epitope pairs for training and testing

Download training and testing data from datasets folder.
Obtain embeddings for TCR and epitopes following instructions from embedders folder.

3. Train and test models

An example for epitope split

python -W ignore bap.py \
                --embedding catELMo_4_layers_1024 \
                --split epitope \
                --gpu 0 \
                --fraction 1 \
                --seed 42

Citation

If you use this code or use our catELMo for your research, please cite our paper:

@article {catelmobiorxiv,
	author = {Pengfei Zhang and Seojin Bang and Michael Cai and Heewook Lee},
	title = {Context-Aware Amino Acid Embedding Advances Analysis of TCR-Epitope Interactions},
	elocation-id = {2023.04.12.536635},
	year = {2023},
	doi = {10.1101/2023.04.12.536635},
	publisher = {Cold Spring Harbor Laboratory},
	journal = {bioRxiv}
}

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
datasets		datasets
embedders		embedders
figures		figures
LICENSE-CC-BY-NC-ND		LICENSE-CC-BY-NC-ND
README.md		README.md
bap.py		bap.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Context-Aware Amino Acid Embedding Advances Analysis of TCR-Epitope Interactions

Publication

Dependencies

Steps to train a Binding Affinity Prediction model for TCR-epitope pairs.

1. Clone the repository

2. Prepare TCR-epitope pairs for training and testing

3. Train and test models

Citation

License

About

Releases 1

Packages

Languages

License

Lee-CBG/catELMo

Folders and files

Latest commit

History

Repository files navigation

Context-Aware Amino Acid Embedding Advances Analysis of TCR-Epitope Interactions

Publication

Dependencies

Steps to train a Binding Affinity Prediction model for TCR-epitope pairs.

1. Clone the repository

2. Prepare TCR-epitope pairs for training and testing

3. Train and test models

Citation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages