NOTE: This repository is being archived. All the model training code has been merged into the OMEGA Labs Any-to-Any Bittensor repo. Please use that one instead.
This repo is meant as a companion to the OMEGA Labs Any-to-Any Bittensor repo. It provides a starting point for training any-to-any models, beginning with video-captioning.
To get started with training, just complete the following steps: 0. Make sure to review the requirements!
- Build the docker container and run it:
make build-and-run
(the following commands are to be run inside thea2a
container) - Log into Huggingface:
huggingface-cli login
. Make sure your account has access to Llama-3-8B on HF, you can get access here - Download the base model and datasets:
make download-everything
- Start training!
make finetune-x1
on a single GPU instance, ormake finetune-xN
where N is the number of GPUs e.g.make finetune-x8
to train on a machine with 8 GPUs - Important: Once you are done training, don't forget to upload the model! See instructions below.
From within the container, just run python upload_ckpt_hf.py --ckpt_dir <checkpoint directory> --epoch <epoch number to upload> --hf_repo_id <repo id to upload to>
- GPU with at least 48 GB VRAM
- CPU with at least 40 GB RAM
Some potential ways for miners to train better checkpoints and get an edge in the incentives are:
- Experiment with the perception_tokens hyperparameter (this refers to how many text tokens each image/audio/video is mapped to)
- Incorporate new datasets and experiment with data cleaning / filtering techniques
- Tweak the prompt templating to make the model robust towards more generic instructions
- Bring in more multi-modal interleaved datasets (e.g. datasets where images, video, audio, and text all appear in-line)
- Could try synthetic data generation here
Miners cannot experiment with the following ideas presently because the validation mechanism is intentionally fairly restrictive to start out, in order to limit the experimentation space. However, these are good directions to be thinking about for future experiments: