UnifyImmun is an advanced computational model that predicts the binding specificity of antigens to both HLA and TCR molecules. By employing a unified cross-attention mechanism, UnifyImmun provides a comprehensive evaluation of antigen immunogenicity, which is crucial for the development of effective immunotherapies.
Web Server: http://hliulab.tech/unifylmmun/
- Unified model: Simultaneously predicts peptide bindings to both HLA and TCR molecules.
- Cross-attention mechanism: Integrates the features of peptides and HLA/TCR molecules for model interpretability.
- Progressive training strategy: Utilizes a two-phase progressive training to improve feature extraction and model generalizability.
- Virtual adversarial training: Enhances model robustness by training on perturbed data.
- Superior performance: Outperforms existing methods on both pHLA and pTCR prediction tasks on multiple datasets.
For inquiries or collaborations, please contact: hliu@njtech.edu.cn
- Linux version: 4.18.0-193 (Centos confirmed)
- GPU: NVIDIA GeForce RTX 4090 (or compatible GPUs)
- CUDA Version: 12.4
- Python: 3.10
- PyTorch: 2.2.1 (model implementation)
- Clone the UnifyImmun repository
git clone https://github.com/hliulab/unifyimmun.git
- Enter UnifyImmun project folder
cd unifyimmun/
- Set up the Python environment and install the required packages
pip install -r requirements.txt
The training data for pHLA and pTCR bindings is stored in the data folder. The source code of UnifyImmun model, as well as the training and testing scripts, are included in the source folder. The trained models are stored in the trained_model folder.
The input data should be a CSV file with three columns named tcr
, peptide
, and HLA
, representing the TCR CDR3 sequence, peptide sequence, and HLA sequence, respectively.
For the convenience of sequentially running all the training steps, you can use the provided Python script named run_all_phases.py. After ensuring that the required environment and dependencies are installed, execute the following code:
cd source
python run_all_phases.py
Given the fine-tuned model or our trained model (saved in trained_model folder), you can evaluate it on our demo test sets using the following scripts.
Predict HLA binding specificity using pHLA test set
cd source
python HLA_test.py
Evaluate TCR binding specificity using pTCR test set
cd source
python TCR_test.py
Given the fine-tuned model or our trained model (saved in trained_model folder), you can output predicted scores for the demo test sets using the following scripts.
Output predicted scores for HLA binding specificity using pHLA test set
cd source
python HLA_output_score.py
Output prediction scores for TCR binding specificity using pTCR test set
cd source
python TCR_output_score.py
In our practice, the time overhead required to run these two demos above is about 2 minutes when batch_size=8192.
If transfer the model using your custom dataset, you may need to adjust the hyperparameters within the Python scripts. Hyperparameters include learning rate, batch size, number of epochs, and other model-specific parameters.
Note: Ensure that the file paths and script names provided in the commands match those in your project directory. The source/ directory and script names like HLA_test.py and TCR_test.py are placeholders and should be replaced with the actual paths and filenames used in your implementation.
To customize the output results, users can modify the parameters within each script. Detailed comments within the code provide descriptions and guidance for parameter adjustments.
For further assistance, bug reports, or to request new features, please contact us at hliu@njtech.edu.cn or open an issue on the GitHub repository page.
Please replace the placeholder links and information with actual data when the repository is available. Ensure that the instructions are clear and that the repository contains the requirements.txt
file with all necessary dependencies listed.