Canonical multimodal workloads Sandbox for Structured Ouputs with Daft
Featuring HuggingFace, vLLM, Gemma 3n, and OpenAI
Core Deliverable
Project Content
- References contains reference examples from ray, vllm, and sglang on structured outputs, as well as a full suite of
llm_generate
inference calls across the most common structured output methods. - Friction contains the original (giant) "Scaling Multimodal Structured Outputs with Gemma-3, vLLM, and Daft", as well as notebooks focused on individal pain points seperated for easier review.
- Workload contains both a full walkthrough notebook and atomic python script for evaluating multimodal model performance on image understanding.
- Integration tests for openai and llm_generate structured outputs usage patterns
- Python: 3.12+
- uv: Fast Python package/venv manager. Install:
pip install uv
Clone this repository and then run
cd daft-structured-outputs
uv venv && uv sync
- This creates a local
.venv
and syncs dependencies frompyproject.toml
. - Prefer running commands with
uv run
without activating the venv.
These are read by tests and examples. A .env.examples
has been provided as a template.
OPENAI_API_KEY
: Any non-empty value when using a local vLLM server (e.g.,none
).OPENAI_BASE_URL
: Defaults to None. vLLM examples default to localhost:8000HF_TOKEN
: Hugging Face token for model pulls. If not set, usemake hf-auth
.MODEL_ID
: for integration tests and CI
Defaults are aligned with project notebooks:
uv run vllm.entrypoints.openai.api_server \
--model google/gemma-3n-e4b-it \
--enable-chunked-prefill \
--guided-decoding-backend guidance \
--dtype bfloat16 \
--gpu-memory-utilization 0.85 \
--host 0.0.0.0 --port 8000
You will need authenticate with Hugging Face to access Gemma-3
hf auth login
- Python scripts (example):
uv run python workload/daft_mm_so_gemma3.py
- Notebooks: open in your IDE or Jupyter and ensure the environment variables above are set in the session.
Run against a live vLLM server (skips if unreachable):
uv run pytest -q tests/test_openai_vllm_integration.py
Environment variables used by the tests:
OPENAI_BASE_URL
(defaulthttp://0.0.0.0:8000/v1
)OPENAI_API_KEY
(defaultnone
)MODEL_ID
(defaultgoogle/gemma-3n-e4b-it
)TEST_IMAGE_URL
(optional; enables the vision test)
- vLLM server not reachable: Ensure
make vllm-serve
is running; confirmOPENAI_BASE_URL
andPORT
. - HF auth required: Run
hf auth login
to authenticate ifHF_TOKEN
is not set. - GPU memory: Adjust
GPU_MEM_UTIL
inmake vllm-serve
for your hardware. - Dependencies: Re-run
uv sync
after modifyingpyproject.toml
.