A minimalistic repo to finetune Wan2.1-1.3B
Demo Videos
unicorn.mp4
kitten.mp4
ice_cube.mp4
clown.mp4
This is a minimalistic, hackable repo to finetune Wan2.1-1.3B on some simple effects courtesy of Hugging Face. The repo includes the implementation of the Wan Model and its finetuning using plain PyTorch.
We'd like to also thank the authors of DiffSynth, FastVideo for their great work which we build on in this repo.
The stages of training implemented in this repo are:
- Data preparation
- Downloading the 3DGS-Dissolve dataset from Hugging Face
- Captioning the videos
- Precomputing text embeddings for the captions
- Precomputing video latents using the VAE for the videos
- Training
- Generation using a newly trained checkpoint
- Clone the repo
git clone https://github.com/Bria-AI/Zero-to-Wan
cd Zero-to-Wan
- Install
uv:
Instructions taken from here.
For linux systems this should be:
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
- Install the dependencies:
uv sync --no-install-package flash-attn
uv sync
- Activate your
.venvand set the Python env:
source .venv/bin/activate
export PYTHONPATH=${PYTHONPATH}:${PWD}
- Run the data preparation script
chmod +x scripts/run_data_prep.sh
./scripts/run_data_prep.sh
- Run the training script
chmod +x scripts/run_train.sh
./scripts/run_train.sh
- Run the generation script currently the generation script uses checkpoint-1000 as the finetuned checkpoint, you can change it to any other checkpoint you want to use for generation.
chmod +x scripts/run_generate.sh
./scripts/run_generate.sh