This project is under active development.
./data
: Contains all data files../hpc_scripts
: Contains scripts for running on the HPC../src
: All source code can be found here.
This project has been verified to be working correctly under the following environments:
- Python 3.8.5
- Ubuntu 20.04.2 LTS / Ubuntu 16.04.7 LTS
- CUDA 10.2 / CUDA 11.2
mkdir BERTwithKG
git clone https://github.com/IBPA/BERTwithKG.git ./BERTwithKG
The following command will create and activate a virtual environment.
cd ./BERTwithKG
python3 -m venv env
source env/bin/activate
Don't forget to deactivate the virtual environment when you're done.
cd ./BERTwithKG
deactivate
Make sure you're still in the virtual environment.
We need to install the nightly build of PyTorch (for now) due to a bug in the stable build. It seems installing packages through the requirements.txt
file does not support the --pre
option. Thus, install the nightly build of PyTorch as follows depending on your version of CUDA:
# For CUDA 11.1
pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu111/torch_nightly.html
# For CUDA 10.2
pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu102/torch_nightly.html
Install all other required python packages.
pip3 install -r requirements.txt
If you want to see an improvement in training time (especially when using multiple GPUs), consider using DeepSpeed. If you're lucky, installing via pypi should be all that's needed.
pip3 install deepspeed
However, you may have to pre-install CPUAdam op specifically by setting the DS_BUILD_CPU_ADAM
environment variable to 1. More info can be found here. For more information, please also refer to the huggingface's documentation to DeepSpeed here.
DS_BUILD_CPU_ADAM=1 pip3 install deepspeed
- For instructions on pre-processing, please refer to its own
README
file. - For instructions on pre-training, please refer to its own
README
file. - For instructions on fine-tuning, please refer to its own
README
file.
- Jason Youn @https://github.com/jasonyoun
For any questions, please contact us at tagkopouloslab@ucdavis.edu.
We will update this section once citation information is available.
This project is licensed under the Apache-2.0 License. Please see the LICENSE
file for details.
- Acknowledgements go here.
- If there are people beta tested the code, help with its writing, etc. add them here.