This experiment is mainly to experiment with domain specific pretraining in the context of code generation language models.
StableLM Base Alpha 3B
The Pile (StableLM Base Alpha 3B) for pretrained model
CodeParrot Clean for both
Arxiv CS papers and other code related datasets without actual unlabelled code (for custom pretraining)
Code and benchmarks coming in the near future
Thanks to the TPU Research Cloud (TRC) program for providing compute resources
Thanks to the various authors whose code I have used to build this. Most notably, Lit-GPT by Lightning AI.
Please reach out to me at at my email to contact me.