This repository is a fork of the original project. For the full research documentation, results, and updates, please refer to the upstream repository:
- Original repo: https://github.com/Alpha-VLLM/Lumina-DiMOO
- Model on Hugging Face: https://huggingface.co/Alpha-VLLM/Lumina-DiMOO
Below are streamlined, reproducible instructions to install and run the demo locally.
Copy-paste these commands in PowerShell from the repo root.
# 1) Install uv (once)
python -m pip install --user uv
# 2) Create the environment from pyproject.toml (non-Torch deps)
uv sync
# 3) Install PyTorch with the correct CUDA wheels (auto-detects CUDA, or add --tag cpu)
uv run --no-sync python scripts/install_torch.py --install-vcredist
# 4) Launch the Gradio UI (defaults to Alpha-VLLM/Lumina-DiMOO, downloads on first use)
uv run --no-sync python -m ui.gradio_appNotes:
- If you have CUDA 12.x/11.x and want to force a tag: add
--tag cu121or--tag cu118to the Torch installer. - The UI has a “Preload / Download Models” button; you can click it or just run a task to trigger the first download.
- Some models may require Hugging Face auth. If so, run
huggingface-cli loginonce before launching the UI.
- Windows 10/11 (64-bit)
- Python 3.10 or 3.11
- NVIDIA GPU with recent driver for CUDA usage (optional; CPU is supported but slow)
- Internet access (to download model weights from Hugging Face on first run)
uv provides fast, reproducible environments from pyproject.toml.
- Install uv if not present
python -m pip install --user uv- Sync environment (non-Torch deps)
uv sync- Install PyTorch with CUDA-aware wheels
uv run --no-sync python scripts/install_torch.pyNotes:
- Auto-detects CUDA. Override with
--tag cu121(CUDA 12.x),--tag cu118(CUDA 11.x), or--tag cpuif needed. - If MSVC runtime is missing, add
--install-vcredist.
Optional extras (after Torch):
uv sync --group torch_extLaunch the UI (uses defaults Alpha-VLLM/Lumina-DiMOO):
uv run --no-sync python -m ui.gradio_appIn the app:
- Click “Preload / Download Models” to fetch weights from Hugging Face, or
- Just run a task; the first request will download automatically and cache locally.
You can change the checkpoint or VAE path in the text fields before preloading/running.
Back-compat entrypoint:
uv run --no-sync python scripts/gradio_app.pyText-to-Image example:
uv run --no-sync python scripts/inference_t2i.py `
--checkpoint Alpha-VLLM/Lumina-DiMOO `
--prompt "A vivid painting of a serene lake under the moonlight" `
--height 768 --width 1536 --timesteps 64 --cfg_scale 4.0 `
--vae_ckpt Alpha-VLLM/Lumina-DiMOO `
--output_dir output/results_text_to_imageImage-to-Image example:
uv run --no-sync python scripts/inference_i2i.py `
--checkpoint Alpha-VLLM/Lumina-DiMOO `
--prompt "Generate a canny edge map according to the image" `
--image_path examples/example_1.png `
--edit_type canny_pred --timesteps 64 `
--cfg_scale 2.5 --cfg_img 4.0 `
--vae_ckpt Alpha-VLLM/Lumina-DiMOO `
--output_dir output/results_image_to_imageUnderstanding example:
uv run --no-sync python scripts/inference_mmu.py `
--checkpoint Alpha-VLLM/Lumina-DiMOO `
--prompt "Please describe this image." `
--image_path examples/example_6.jpg `
--steps 128 --gen_length 128 --block_length 32 `
--vae_ckpt Alpha-VLLM/Lumina-DiMOO `
--output_dir output/outputs_text_understanding- If CUDA isn’t detected, ensure your NVIDIA driver is installed and recent. Then rerun
scripts/install_torch.py. - For Windows MSVC runtime issues, run with
--install-vcredist. - To force a specific wheel set:
--tag cu121(CUDA 12.x),--tag cu118(CUDA 11.x),--tag cpu. - If a model requires authentication, run
huggingface-cli loginbefore launching.
This fork inherits the original project’s license. See LICENSE in this repository and the upstream repository for details.