OMEGA Labs Any-to-Any Trainer

NOTE: This repository is being archived. All the model training code has been merged into the OMEGA Labs Any-to-Any Bittensor repo. Please use that one instead.

This repo is meant as a companion to the OMEGA Labs Any-to-Any Bittensor repo. It provides a starting point for training any-to-any models, beginning with video-captioning.

Quickstart

To get started with training, just complete the following steps: 0. Make sure to review the requirements!

Build the docker container and run it: make build-and-run (the following commands are to be run inside the a2a container)
Log into Huggingface: huggingface-cli login. Make sure your account has access to Llama-3-8B on HF, you can get access here
Download the base model and datasets: make download-everything
Start training! make finetune-x1 on a single GPU instance, or make finetune-xN where N is the number of GPUs e.g. make finetune-x8 to train on a machine with 8 GPUs
Important: Once you are done training, don't forget to upload the model! See instructions below.

Uploading model to Huggingface

From within the container, just run python upload_ckpt_hf.py --ckpt_dir <checkpoint directory> --epoch <epoch number to upload> --hf_repo_id <repo id to upload to>

Requirements

GPU with at least 48 GB VRAM
CPU with at least 40 GB RAM

Experiment Ideas

Some potential ways for miners to train better checkpoints and get an edge in the incentives are:

Experiment with the perception_tokens hyperparameter (this refers to how many text tokens each image/audio/video is mapped to)
Incorporate new datasets and experiment with data cleaning / filtering techniques
Tweak the prompt templating to make the model robust towards more generic instructions
Bring in more multi-modal interleaved datasets (e.g. datasets where images, video, audio, and text all appear in-line)
- Could try synthetic data generation here

Future Experiment Ideas

Miners cannot experiment with the following ideas presently because the validation mechanism is intentionally fairly restrictive to start out, in order to limit the experimentation space. However, these are good directions to be thinking about for future experiments:

Multimodal tokenization (early fusion): Chameleon
- In general, we believe there will be a big push towards training end-to-end sequences
JEPA's
High-fidelity auto-regressive video generation
Zero-shot personalization
Screen understanding models

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
config		config
ds		ds
media		media
models		models
tune_recipes		tune_recipes
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
pull_hf.py		pull_hf.py
requirements.txt		requirements.txt
setup.py		setup.py
upload_ckpt_hf.py		upload_ckpt_hf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OMEGA Labs Any-to-Any Trainer

NOTE: This repository is being archived. All the model training code has been merged into the OMEGA Labs Any-to-Any Bittensor repo. Please use that one instead.

Quickstart

Uploading model to Huggingface

Requirements

Experiment Ideas

Future Experiment Ideas

About

Releases

Packages

Contributors 2

Languages

omegalabsinc/omegalabs-anytoany-trainer

Folders and files

Latest commit

History

Repository files navigation

OMEGA Labs Any-to-Any Trainer

NOTE: This repository is being archived. All the model training code has been merged into the OMEGA Labs Any-to-Any Bittensor repo. Please use that one instead.

Quickstart

Uploading model to Huggingface

Requirements

Experiment Ideas

Future Experiment Ideas

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages