Skip to content

SidGoreDev/Cosmos-SafetyNet

Repository files navigation

Cosmos SafetyNet — Physical AI Safety Reasoning

Cosmos Reason 2 (CR2) is the system: one model, multiple safety reasoning tasks on video, delivered as structured JSON plus <think> reasoning traces.

This submission is scoped to a forklift-safety demo built from five short warehouse incident clips.

Requirements

  • Python 3.11+
  • An OpenAI-compatible chat-completions endpoint serving nvidia/Cosmos-Reason2-8B
    • This project was built/tested against a Nebius-managed vLLM deployment (see .env.example)

Setup

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
copy .env.example .env

Edit .env with your endpoint URL and API key.

Endpoint contract (what you need to run this)

This repo assumes an OpenAI-compatible endpoint that supports:

  • POST /v1/chat/completions
  • Multimodal user messages where content is a list containing:
    • {"type":"video_url","video_url":{"url":"data:video/mp4;base64,..."}}
    • {"type":"text","text":"<prompt>"}

We do not fine-tune CR2; everything here is prompt + pipeline orchestration.

Configuration used (inference defaults)

Defaults are defined in src/config.py (and can be overridden via environment variables):

  • NEBIUS_VLLM_MODEL=nvidia/Cosmos-Reason2-8B
  • NEBIUS_VLLM_TEMPERATURE=0.6
  • NEBIUS_VLLM_TOP_P=0.95
  • NEBIUS_VLLM_TOP_K=20
  • NEBIUS_VLLM_MAX_TOKENS=1600
  • Multimodal sampling: NEBIUS_VLLM_MM_FPS=6, NEBIUS_VLLM_DO_SAMPLE_FRAMES=true

CLI

Analyze a single video:

python -m src.cli analyze --mode forklift --video "data/videos/forklift safety/VID1 A Forklift Accident Near Miss - Kyle Thill (240p, h264).mp4"

Modes: forklift, load, safety, security, timeline, full

Run the submission batch manifest:

python -m src.cli batch --manifest .\batch\batch_manifest_forklift_vid1_5.yaml --force

Render the exact request payload (offline; no Nebius calls):

python -m src.cli render --mode forklift --video "data/videos/forklift safety/VID1 A Forklift Accident Near Miss - Kyle Thill (240p, h264).mp4"

Parse a saved raw output into JSON + <think> (offline):

python -m src.cli parse --raw "outputs/forklift_vid1_5/per_stream/VID1 A Forklift Accident Near Miss - Kyle Thill (240p, h264).mp4__forklift.raw.txt"

Run evaluation (offline, against the hand-labeled clips included in data/ground_truth/):

python -m src.cli eval --results .\outputs\forklift_vid1_5\per_stream --ground-truth .\data\ground_truth --out .\outputs\forklift_vid1_5\eval_report.json

Generate a human-readable Markdown report (offline):

python -m src.cli report --results .\outputs\forklift_vid1_5\per_stream --out .\reports\runs\forklift_demo_report.md

Generate per-clip near-miss reports (offline; one report per passing video + a JSON manifest):

python -m src.cli near-miss --results .\outputs\forklift_vid1_5\per_stream --out-dir .\reports\runs\near_miss

Generate a self-contained HTML dashboard viewer (offline):

python -m src.cli dashboard --results .\outputs\forklift_vid1_5\per_stream --videos-dir "data/videos/forklift safety" --out .\reports\runs\ops_center_view.html --title "Cosmos SafetyNet — Forklift Demo"

Open the generated *.html file in your browser, then use the “Load local video” picker to attach the corresponding clip for click-to-seek timelines.

Run tests:

python -m unittest discover -s tests

Prompting Convention

We follow the NVIDIA Cosmos reason prompt guide: media-first ordering plus the standard reasoning suffix appended to the user prompt.

Practical Video Payload Limits (Fail Loudly)

Local videos are embedded as base64 data: URIs. To avoid silent request-size failures, we refuse to embed very large local files by default.

  • Override with NEBIUS_VLLM_MAX_VIDEO_MB (default: 25)
  • For long clips, create a short excerpt with ffmpeg and analyze that instead.

Submission contents

  • data/videos/forklift safety/ contains the five submission clips (VID1 through VID5)
  • outputs/forklift_vid1_5/per_stream/ contains the saved JSON, raw, and think artifacts
  • reports/runs/ops_center_view.html is the self-contained dashboard used for review
  • submission/VIDEO_SOURCES.md lists the source citations for each included video

About

Cosmos SafetyNet: CR2-powered forklift safety reasoning with structured JSON outputs and a review dashboard.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors