ActiveTCR is a unified framework designed to minimize the annotation cost and maximize the predictive performance of T-cell receptor (TCR) and epitope binding affinity prediction models. It incorporates active learning techniques to iteratively search for the most informative unlabeled TCR-epitope pairs, reducing annotation costs and redundancy. By leveraging four query strategies and comparing them to a random sampling baseline, ActiveTCR demonstrates significant cost reduction and improved performance in TCR-epitope binding affinity prediction. ActiveTCR is the first systematic investigation of data optimization in the context of TCR-epitope binding affinity prediction.
An Active Learning Framework for Cost-Effective TCR-Epitope Binding Affinity Prediction
Pengfei Zhang1,2, Seojin Bang3, Heewook Lee1,2, *
1 School of Computing and Augmented Intelligence, Arizona State University, 2 Biodesign Institute, Arizona State University, 3 Google DeepMind
Accepted for publication: IEEE BIBM 2023
Paper | Code | Poster | Slides | Presentation (YouTube)
- Use case a: reducing more than 40% annotation cost for unlabel TCR-epitope pools.
- Use case b: minimizing more than 40% redundancy among already annotated TCR-epitope pairs.
- Linux
- Python 3.6.13
- Keras 2.6.0
- TensorFlow 2.6.0
git clone https://github.com/Lee-CBG/ActiveTCR
cd ActiveTCR/
conda create --name bap python=3.6.13
pip install -r requirements.txt
source activate bap
- Download training and testing data from
datasets
folder. - Obtain embeddings for TCR and epitopes following instructions of
catELMo
. Or directly download embeddings from Dropbox.
An example for use case a
of ActiveTCR: reducing annotation cost for unlabeled TCR-epitope pools.
python -W ignore main.py \
--split epi \
--active_learning True \
--query_strategy entropy_sampling \
--train_strategy retrain \
--query_balanced unbalanced \
--gpu 0 \
--run 0
An example for use case b
of ActiveTCR: minimizing redundancy among labeled TCR-epitope pairs.
python -W ignore main.py \
--split epi \
--query_strategy entropy_sampling \
--train_strategy retrain \
--query_balanced unbalanced \
--gpu 1 \
--run 0
If you use this code or use our catELMo for your research, please cite our paper:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.