Fork of ostris/ai-toolkit with custom modifications for Shootify's training pipeline.
Deployed at: https://ai-toolkit-01.shootify.io/dashboard
Replaced the original resolution system with explicit [width, height] bucket pairs and added 4:5 portrait aspect ratio buckets optimized for garment/fashion training:
- 8 new 4:5 buckets: 960x1200 through 1792x2240
- New
ResolutionSelectorUI component with per-bucket width/height editing - TypeScript bucket utility mirroring Python bucket generation logic
- Migration support for old config formats (single int, flat array)
Files changed:
toolkit/buckets.py— new 4:5 portrait bucketstoolkit/config_modules.py— resolution format as[w, h][]toolkit/dataloader_mixins.py— support exact[w, h]bucket dimensionsui/src/components/ResolutionSelector.tsx— new componentui/src/utils/buckets.ts— TypeScript bucket utilityui/src/app/jobs/new/SimpleJob.tsx— use ResolutionSelectorui/src/app/jobs/new/jobConfig.ts— resolution migrationui/src/app/jobs/new/page.tsx— simplified viewui/src/types.ts— updated resolution type
hf_transfer (Rust-based download accelerator) has a bug that stalls downloads at the 2^34 byte boundary (~16GB). Disabled it and Xet storage protocol to use plain HTTP instead.
Environment variables added to job spawn:
HF_HUB_ENABLE_HF_TRANSFER=0HF_HUB_DISABLE_XET=1HF_HUB_DOWNLOAD_TIMEOUT=300HF_HOME=/data/.cache/huggingface
File changed:
ui/cron/actions/startJob.ts
Clones from shootify-io/ai-toolkit fork instead of upstream ostris repo.
File changed:
docker/Dockerfile
- Added
allowedDevOriginsforai-toolkit-01.shootify.io - Set dev port to 8675
- Removed turbopack
Files changed:
ui/next.config.tsui/package.json
- Host: GCP VM (A100-SXM4-80GB, CUDA 12.4)
- URL: https://ai-toolkit-01.shootify.io/dashboard
- Working directory:
/home/info/shootify-deployment/ai-toolkit/
The app runs directly on the host via npm run dev (Next.js + cron worker via concurrently) on port 8675.
Start:
cd /home/info/shootify-deployment/ai-toolkit/ui
nohup npm run dev > /tmp/dev-server.log 2>&1 &Stop:
pkill -f "next dev --port 8675"
pkill -f "ts-node-dev.*worker"
pkill -f "concurrently.*WORKER"Logs: /tmp/dev-server.log
There is also a systemd service at /etc/systemd/system/ai-toolkit.service.
.venv is a conda environment with Python 3.12 created at the repo root:
conda create -p /home/info/shootify-deployment/ai-toolkit/.venv python=3.12 -y
.venv/bin/pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
.venv/bin/pip install -r requirements.txtThe startJob.ts worker detects .venv/bin/python automatically.
| Path | Contents |
|---|---|
/home/info/shootify-deployment/ai-toolkit/output/ |
Training outputs (logs, checkpoints, samples) |
/home/info/shootify-deployment/aitk_db.db |
SQLite database (jobs, settings) |
/data/.cache/huggingface/ |
HuggingFace model cache |
/home/info/shootify-deployment/ai-toolkit/datasets/ |
Training datasets |
For general AI Toolkit documentation, features, and supported models, see the original repo: ostris/ai-toolkit