We expect you have CUDA 11.8 installed.
pip install --index-url https://download.pytorch.org/whl/nightly/cu118 --pre 'torch>=2.1.0dev'
Note: as of 2023/09/02, xformers does not provide pre-built binaries for torch 2.1. You have to build it from source.
pip uninstall ninja -y && pip install ninja -U
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention
python setup.py install
cd csrc/rotary && pip install .
cd ../layer_norm && pip install .
cd ../xentropy && pip install .
cd ../.. && rm -rf flash-attention
pip install -r requirements.txt tokenizers sentencepiece
to install other dependencies. It may take >= 5 minutes to build xformers/flash-attention. Do not worry if the process seemly stagnant or the terminal print out many warnings.
Then you are ready to go 🎉!
Download the Slimpajama and Starcoderdata datasets to your chosen directory.
cd /path/to/dataset
git lfs install
git clone https://huggingface.co/datasets/cerebras/SlimPajama-627B
git clone https://huggingface.co/datasets/bigcode/starcoderdata
The SlimPajama dataset eats 893GB diskspace and the starcoderdata takes 290GB.
Use the provided scripts to tokenize the datasets and divide them into chunks.
python scripts/prepare_starcoder.py --source_path /path/to/starcoderdata/ --tokenizer_path data/llama --destination_path data/slim_star_combined --split train --percentage 1.0
python scripts/prepare_slimpajama.py --source_path /path/to/SlimPajama --tokenizer_path data/llama --destination_path data/slim_star_combined --split validation --percentage 1.0
python scripts/prepare_slimpajama.py --source_path /path/to/SlimPajama --tokenizer_path data/llama --destination_path data/slim_star_combined --split train --percentage 1.0
The processed data will take 1.8T storage.
If your setup comprises two nodes, each with 8 GPUs, you can initiate pretraining with the following commands:
On node 1:
lightning run model \
--node-rank=0 \
--main-address=172.16.101.5 \
--accelerator=cuda \
--devices=8 \
--num-nodes=2 \
pretrain/tinyllama.py --devices 8 --train_data_dir data/slim_star --val_data_dir data/slim_star
On node 2:
lightning run model \
--node-rank=1 \
--main-address=172.16.101.5 \
--accelerator=cuda \
--devices=8 \
--num-nodes=2 \
pretrain/tinyllama.py --devices 8 --train_data_dir data/slim_star --val_data_dir data/slim_star
You can follow these instructions if you have a slurm cluster.