Skip to content

A unified cross-attention model for predicting antigen binding specificity to both HLA and TCR molecules

License

Notifications You must be signed in to change notification settings

hliulab/UnifyImmun

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UnifyImmun: a unified cross-attention model for prediction of antigen binding specificity

DOI image

UnifyImmun is an advanced computational model that predicts the binding specificity of antigens to both HLA and TCR molecules. By employing a unified cross-attention mechanism, UnifyImmun provides a comprehensive evaluation of antigen immunogenicity, which is crucial for the development of effective immunotherapies.

Key features

  • Unified model: Simultaneously predicts peptide bindings to both HLA and TCR molecules.
  • Cross-attention mechanism: Integrates the features of peptides and HLA/TCR molecules for model interpretability.
  • Progressive training strategy: Utilizes a two-phase progressive training to improve feature extraction and model generalizability.
  • Virtual adversarial training: Enhances model robustness by training on perturbed data.
  • Superior performance: Outperforms existing methods on both pHLA and pTCR prediction tasks on multiple datasets.

For inquiries or collaborations, please contact: hliu@njtech.edu.cn

System requirements

  • Linux version: 4.18.0-193 (Centos confirmed)
  • GPU: NVIDIA GeForce RTX 4090 (or compatible GPUs)
  • CUDA Version: 12.4
  • Python: 3.10
  • PyTorch: 2.2.1 (model implementation)

Installation guide

  1. Clone the UnifyImmun repository

git clone https://github.com/hliulab/unifyimmun.git

  1. Enter UnifyImmun project folder

cd unifyimmun/

  1. Set up the Python environment and install the required packages

pip install -r requirements.txt

Instructions for use

The training data for pHLA and pTCR bindings is stored in the data folder. The source code of UnifyImmun model, as well as the training and testing scripts, are included in the source folder. The trained models are stored in the trained_model folder.

Input data format

The input data should be a CSV file with three columns named tcr, peptide, and HLA, representing the TCR CDR3 sequence, peptide sequence, and HLA sequence, respectively.

Model training

For the convenience of sequentially running all the training steps, you can use the provided Python script named run_all_phases.py. After ensuring that the required environment and dependencies are installed, execute the following code:

cd source

python run_all_phases.py

Model testing

Given the fine-tuned model or our trained model (saved in trained_model folder), you can evaluate it on our demo test sets using the following scripts.

Predict HLA binding specificity using pHLA test set

cd source

python HLA_test.py

Evaluate TCR binding specificity using pTCR test set

cd source

python TCR_test.py

Output predicted scores

Given the fine-tuned model or our trained model (saved in trained_model folder), you can output predicted scores for the demo test sets using the following scripts.

Output predicted scores for HLA binding specificity using pHLA test set

cd source

python HLA_output_score.py

Output prediction scores for TCR binding specificity using pTCR test set

cd source

python TCR_output_score.py

In our practice, the time overhead required to run these two demos above is about 2 minutes when batch_size=8192.

Hyperparameter adjustment

If transfer the model using your custom dataset, you may need to adjust the hyperparameters within the Python scripts. Hyperparameters include learning rate, batch size, number of epochs, and other model-specific parameters.

Note: Ensure that the file paths and script names provided in the commands match those in your project directory. The source/ directory and script names like HLA_test.py and TCR_test.py are placeholders and should be replaced with the actual paths and filenames used in your implementation.

Customizing output

To customize the output results, users can modify the parameters within each script. Detailed comments within the code provide descriptions and guidance for parameter adjustments.

Support

For further assistance, bug reports, or to request new features, please contact us at hliu@njtech.edu.cn or open an issue on the GitHub repository page.


Please replace the placeholder links and information with actual data when the repository is available. Ensure that the instructions are clear and that the repository contains the requirements.txt file with all necessary dependencies listed.

About

A unified cross-attention model for predicting antigen binding specificity to both HLA and TCR molecules

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages