STAG-LLM

This repository contains the code for STAG-LLM, a novel model for predicting TCR-pMHC binding specificity by integrating sequence information from a pre-trained Large Language Model (ESM-2) with structural insights captured by a Graph Neural Network (GNN).

Executable Notebook

For easiest inference using our model, run STAG-LLM in our google colab notebook. Note, you will first need to model your TCR-pHLA complex using TCRmodel2.

Model Architecture

The STAG-LLM model combines sequence embeddings generated by a fine-tuned ESM-2 model with graph representations derived from TCR-pMHC structures. These two modalities are then combined for binding specificity prediction.

Data Preparation

IMPORTANT: Before running any code, you must download the data folder, which contains the raw input data for the project. Please download it from data and place it in the root directory of this project.

After downloading the data folder, you need to unzip and preprocess the PDB files to convert them into graph representations that can be used by the model. Run the pdbs_to_graphs.py script:

python pdbs_to_graphs.py

Training the Model

To replicate the experiments from the paper and train the STAG-LLM model from scratch:

python train.py

Training progress and evaluation metrics will be logged in the test directory (or the directory configured in train.py).

Using Pretrained Models for Evaluation

Pretrained models are provided in the pretrained_models directory. Please download it from pretrained_models and place it in the root directory of this project. You can use these models to score individual input PDB files or evaluate on a test set.

Place your input PDB files in a designated directory. (PDB files must contian D,E chians for the TCR and A,C chains for the pMHC. We recomend modeling structures with TCRmodel2)

Run the evaluate.py script:

python evaluate.py --model_path path/to/your/pretrained_model.pt --pdb_file path/to/your/input.pdb

Project Structure

.
├── data/
│   ├── full_seq_df_new.csv
│   └── final_dataset_modeled.csv
│   └── top_structures.zip (pdb data files)
├── hetero_edge_graphs/ (generated by pdbs_to_graphs.py)
├── pretrained_models/
│   └── ... (pretrained model checkpoints)
├── requirements.txt
├── train.py
├── model.py
├── data_handling.py
├── utils.py
├── pdbs_to_graphs.p
└── evaluate.py
├── README.md
├── STAG_LLM_image.png (image asset)

We have compared our approach to five models from the literature.

For the STAG model, please visit STAG
For the netTCR 2.2 model, please visit NetTCR 2.2
For the TCR-ESM model, please visit TCR-ESM
For the ERGO II (AE and LSTM), please visit ERGO-II

Citation

Jared K. Slone, Minying Zhang, Peixin Jiang, Amanda Montoya, Emily Bontekoe, Barbara Nassif Rausseo, Alexandre Reuben, Lydia E. Kavraki, STAG-LLM: Predicting TCR-pHLA binding with protein language models and computationally generated 3D structures, Computational and Structural Biotechnology Journal, Volume 27, 2025, Pages 3885-3896, ISSN 2001-0370, https://doi.org/10.1016/j.csbj.2025.09.004. (https://www.sciencedirect.com/science/article/pii/S2001037025003642)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STAG-LLM

Executable Notebook

Model Architecture

Data Preparation

Training the Model

Using Pretrained Models for Evaluation

Project Structure

We have compared our approach to five models from the literature.

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
hetero_edge_graphs		hetero_edge_graphs
test		test
LICENSE		LICENSE
README.md		README.md
STAG_LLM_image.png		STAG_LLM_image.png
data_handling.py		data_handling.py
evaluate.py		evaluate.py
model.py		model.py
pdbs_to_graphs.py		pdbs_to_graphs.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

STAG-LLM

Executable Notebook

Model Architecture

Data Preparation

Training the Model

Using Pretrained Models for Evaluation

Project Structure

We have compared our approach to five models from the literature.

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages