Cosmos Reason 2 (CR2) is the system: one model, multiple safety reasoning tasks on video, delivered as structured JSON plus <think> reasoning traces.
This submission is scoped to a forklift-safety demo built from five short warehouse incident clips.
- Python 3.11+
- An OpenAI-compatible chat-completions endpoint serving
nvidia/Cosmos-Reason2-8B- This project was built/tested against a Nebius-managed vLLM deployment (see
.env.example)
- This project was built/tested against a Nebius-managed vLLM deployment (see
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
copy .env.example .envEdit .env with your endpoint URL and API key.
This repo assumes an OpenAI-compatible endpoint that supports:
POST /v1/chat/completions- Multimodal user messages where
contentis a list containing:{"type":"video_url","video_url":{"url":"data:video/mp4;base64,..."}}{"type":"text","text":"<prompt>"}
We do not fine-tune CR2; everything here is prompt + pipeline orchestration.
Defaults are defined in src/config.py (and can be overridden via environment variables):
NEBIUS_VLLM_MODEL=nvidia/Cosmos-Reason2-8BNEBIUS_VLLM_TEMPERATURE=0.6NEBIUS_VLLM_TOP_P=0.95NEBIUS_VLLM_TOP_K=20NEBIUS_VLLM_MAX_TOKENS=1600- Multimodal sampling:
NEBIUS_VLLM_MM_FPS=6,NEBIUS_VLLM_DO_SAMPLE_FRAMES=true
Analyze a single video:
python -m src.cli analyze --mode forklift --video "data/videos/forklift safety/VID1 A Forklift Accident Near Miss - Kyle Thill (240p, h264).mp4"Modes: forklift, load, safety, security, timeline, full
Run the submission batch manifest:
python -m src.cli batch --manifest .\batch\batch_manifest_forklift_vid1_5.yaml --forceRender the exact request payload (offline; no Nebius calls):
python -m src.cli render --mode forklift --video "data/videos/forklift safety/VID1 A Forklift Accident Near Miss - Kyle Thill (240p, h264).mp4"Parse a saved raw output into JSON + <think> (offline):
python -m src.cli parse --raw "outputs/forklift_vid1_5/per_stream/VID1 A Forklift Accident Near Miss - Kyle Thill (240p, h264).mp4__forklift.raw.txt"Run evaluation (offline, against the hand-labeled clips included in data/ground_truth/):
python -m src.cli eval --results .\outputs\forklift_vid1_5\per_stream --ground-truth .\data\ground_truth --out .\outputs\forklift_vid1_5\eval_report.jsonGenerate a human-readable Markdown report (offline):
python -m src.cli report --results .\outputs\forklift_vid1_5\per_stream --out .\reports\runs\forklift_demo_report.mdGenerate per-clip near-miss reports (offline; one report per passing video + a JSON manifest):
python -m src.cli near-miss --results .\outputs\forklift_vid1_5\per_stream --out-dir .\reports\runs\near_missGenerate a self-contained HTML dashboard viewer (offline):
python -m src.cli dashboard --results .\outputs\forklift_vid1_5\per_stream --videos-dir "data/videos/forklift safety" --out .\reports\runs\ops_center_view.html --title "Cosmos SafetyNet — Forklift Demo"Open the generated *.html file in your browser, then use the “Load local video” picker to attach the corresponding clip for click-to-seek timelines.
Run tests:
python -m unittest discover -s testsWe follow the NVIDIA Cosmos reason prompt guide: media-first ordering plus the standard reasoning suffix appended to the user prompt.
Local videos are embedded as base64 data: URIs. To avoid silent request-size failures, we refuse to embed very large local files by default.
- Override with
NEBIUS_VLLM_MAX_VIDEO_MB(default: 25) - For long clips, create a short excerpt with
ffmpegand analyze that instead.
data/videos/forklift safety/contains the five submission clips (VID1throughVID5)outputs/forklift_vid1_5/per_stream/contains the saved JSON, raw, and think artifactsreports/runs/ops_center_view.htmlis the self-contained dashboard used for reviewsubmission/VIDEO_SOURCES.mdlists the source citations for each included video