Author: Talip Ucar (ucabtuc@gmail.com)
The official implementation of Improving Antibody Humanness Prediction using Patent Data
- Model
- Environment
- Configuration
- Training and Evaluation
- Structure of the repo
- Results
- Experiment tracking
- Citing the paper
- Citing this repo
Pre-training | Fine-tuning |
---|---|
We used Python 3.7 for our experiments. The environment can be set up by following three steps:
pip install pipenv # To install pipenv if you don't have it already
pipenv install --skip-lock # To install required packages.
pipenv shell # To activate virtual env
If the second step results in issues, you can install packages in Pipfile individually by using pip i.e. "pip install package_name".
There are two types of configuration files:
1. pad.yaml # Defines parameters and options for pre-training
2. humanness.yaml # Defines parameters and options for fine-training
You can train and evaluate the model by using:
python selfpad_pretrain.py # For pre-training
python selfpad_finetune.py # For fine-tuning it for humanness
python selfpad_eval.py -ev test # To compute humanness score for custome dataset, in this case it is test.csv. CSV file should have "VH", "VL" and/or "Label" columns
- selfpad_pretrain.py - selfpad_finetune.py - selfpad_eval.py - src |-selfpad.py |-selfpad_humanness.py - config |-pad.yaml |-humanness.yaml - utils_common |-arguments.py |-utils.py |-tokenizer.py ... - utils_pretrain |-load_data.py |-model_utils.py |-loss_functions.py ... - utils_finetune |-load_data.py |-model_utils.py |-loss_functions.py ... - data |-test.csv ... - results |-pretraining |-humanness ...
Results at the end of training is saved under ./results
directory. Results directory structure is as following:
- results |-task e.g. humanness, or pretraining |-evaluation |-clusters (for plotting t-SNE and PCA plots of embeddings) |-training |-model |-plots |-loss
You can save results of evaluations under "evaluation" folder.
You can turn on Weight and Biases (W&B) in the config file for logging
@article{ucar2024SelfPAD,
title={Improving Antibody Humanness Prediction using Patent Data},
author={Ucar, Talip and
Ramon, Aubin and
Oglic, Dino and
Croasdale-Wood, Rebecca and
Diethe, Tom and
Sormanni, Pietro},
journal={arXiv preprint arXiv:2110.04361},
year={2024}
}
If you use SelfPAD framework in your own studies, and work, please cite it by using the following:
@Misc{talip_ucar_2024_SelfPAD,
author = {Talip Ucar},
title = {{Improving Antibody Humanness Prediction using Patent Data}},
howpublished = {\url{https://github.com/AstraZeneca/SelfPAD}},
month = January,
year = {since 2024}
}