Skip to content

sha5b/Lumina-DiMOO

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lumina-DiMOO (Fork)

This repository is a fork of the original project. For the full research documentation, results, and updates, please refer to the upstream repository:

Below are streamlined, reproducible instructions to install and run the demo locally.

Quick Start (Windows, uv)

Copy-paste these commands in PowerShell from the repo root.

# 1) Install uv (once)
python -m pip install --user uv

# 2) Create the environment from pyproject.toml (non-Torch deps)
uv sync

# 3) Install PyTorch with the correct CUDA wheels (auto-detects CUDA, or add --tag cpu)
uv run --no-sync python scripts/install_torch.py --install-vcredist

# 4) Launch the Gradio UI (defaults to Alpha-VLLM/Lumina-DiMOO, downloads on first use)
uv run --no-sync python -m ui.gradio_app

Notes:

  • If you have CUDA 12.x/11.x and want to force a tag: add --tag cu121 or --tag cu118 to the Torch installer.
  • The UI has a “Preload / Download Models” button; you can click it or just run a task to trigger the first download.
  • Some models may require Hugging Face auth. If so, run huggingface-cli login once before launching the UI.

Requirements

  • Windows 10/11 (64-bit)
  • Python 3.10 or 3.11
  • NVIDIA GPU with recent driver for CUDA usage (optional; CPU is supported but slow)
  • Internet access (to download model weights from Hugging Face on first run)

Install with uv (recommended)

uv provides fast, reproducible environments from pyproject.toml.

  1. Install uv if not present
python -m pip install --user uv
  1. Sync environment (non-Torch deps)
uv sync
  1. Install PyTorch with CUDA-aware wheels
uv run --no-sync python scripts/install_torch.py

Notes:

  • Auto-detects CUDA. Override with --tag cu121 (CUDA 12.x), --tag cu118 (CUDA 11.x), or --tag cpu if needed.
  • If MSVC runtime is missing, add --install-vcredist.

Optional extras (after Torch):

uv sync --group torch_ext

Run the Gradio demo

Launch the UI (uses defaults Alpha-VLLM/Lumina-DiMOO):

uv run --no-sync python -m ui.gradio_app

In the app:

  • Click “Preload / Download Models” to fetch weights from Hugging Face, or
  • Just run a task; the first request will download automatically and cache locally.

You can change the checkpoint or VAE path in the text fields before preloading/running.

Back-compat entrypoint:

uv run --no-sync python scripts/gradio_app.py

Run from CLI (optional)

Text-to-Image example:

uv run --no-sync python scripts/inference_t2i.py `
  --checkpoint Alpha-VLLM/Lumina-DiMOO `
  --prompt "A vivid painting of a serene lake under the moonlight" `
  --height 768 --width 1536 --timesteps 64 --cfg_scale 4.0 `
  --vae_ckpt Alpha-VLLM/Lumina-DiMOO `
  --output_dir output/results_text_to_image

Image-to-Image example:

uv run --no-sync python scripts/inference_i2i.py `
  --checkpoint Alpha-VLLM/Lumina-DiMOO `
  --prompt "Generate a canny edge map according to the image" `
  --image_path examples/example_1.png `
  --edit_type canny_pred --timesteps 64 `
  --cfg_scale 2.5 --cfg_img 4.0 `
  --vae_ckpt Alpha-VLLM/Lumina-DiMOO `
  --output_dir output/results_image_to_image

Understanding example:

uv run --no-sync python scripts/inference_mmu.py `
  --checkpoint Alpha-VLLM/Lumina-DiMOO `
  --prompt "Please describe this image." `
  --image_path examples/example_6.jpg `
  --steps 128 --gen_length 128 --block_length 32 `
  --vae_ckpt Alpha-VLLM/Lumina-DiMOO `
  --output_dir output/outputs_text_understanding

Troubleshooting

  • If CUDA isn’t detected, ensure your NVIDIA driver is installed and recent. Then rerun scripts/install_torch.py.
  • For Windows MSVC runtime issues, run with --install-vcredist.
  • To force a specific wheel set: --tag cu121 (CUDA 12.x), --tag cu118 (CUDA 11.x), --tag cpu.
  • If a model requires authentication, run huggingface-cli login before launching.

License

This fork inherits the original project’s license. See LICENSE in this repository and the upstream repository for details.

About

Lumina-DiMOO - An Open-Sourced Multi-Modal Large Diffusion Language Model

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%