Skip to content

doosan-robotics/palletizing-ai

Repository files navigation

Palletizing AI Banner

palletizing-ai

AI-Powered Optimal Palletizing with Physical Reasoning — powered by NVIDIA Cosmos Reason2 and MuJoCo physics simulation.

Evaluates 10 industrial stacking patterns through physics simulation (shake & tilt tests), then uses a 3-round VLM tournament where Cosmos Reason2 designs progressively harder tests to find the optimal pattern — and explains why.

🇰🇷 한국어 버전 (Korean)


About Doosan Robotics

Doosan Robotics is a global leader in collaborative robotics, operating across 45+ countries. Recognized with the CES 2026 Best of Innovation Award (AI category), Doosan Robotics is an NVIDIA strategic partner with an existing cuMotion collaboration — this project extends that partnership into AI-powered palletizing with Cosmos Reason2.


The Problem

The global palletizing market exceeds $150 billion, yet 23% of warehouse injuries involve falling pallets. Fixed stacking rules can't adapt to diverse box sizes or off-center weight. Without physics validation, failures are discovered on the warehouse floor — not in simulation.


Our Approach — "Think. Verify. Act."

Phase What Happens
Think Cosmos Reason2 analyzes box physics across 10 candidate patterns
Verify MuJoCo simulates shake & tilt stress tests with iterative tournament rounds
Act The verified winning pattern is ready for execution with full traceability

System Architecture

GENERATE  →  TOURNAMENT (3 rounds)  →  EXPLAIN
                 ↕ each round: SIMULATE + Reasoning VLM TUNE
Stage Description
Generate Deterministic coordinate generation for 10 stacking patterns (<10ms)
Tournament 3-round elimination: R1 (10→5), R2 (5→2), R3 (2→1). Each round runs MuJoCo shake + tilt tests. From R2, Cosmos Reason2 analyzes score discriminability and proposes updated physics parameters for the next round
Explain Cosmos Reason2 generates an engineering rationale for the winner with transport condition analysis

Scoring

All elimination decisions use physics-only scoring:

Component Weight
Stability 50%
Space 50%

Reasoning VLM does not score patterns — it only tunes physics parameters between rounds (TUNE) and generates the final engineering rationale (EXPLAIN).


Cosmos Reason2 Integration

  • 4-tier auto-detection: HuggingFace → Custom API → GPU (vLLM) → Ollama fallback
  • No API key required for Ollama backend — runs fully local
  • VLM Coach TUNE: Cosmos doesn't just judge patterns — it designs harder tests each round
  • iso_view renders: Alternating-color isometric renders optimized for VLM visual reasoning

10 Stacking Patterns

Column Stack · Full Interlock · Partial Interlock · Brick · Pinwheel · Hybrid Pinwheel · Split Row · Split Block · Row · Spiral

12 pallet presets (EUR, GMA, Asia, AU, NA standards) → Pallet Standards Reference


Results

Metric Value
Winner Full Interlock (94.2 confidence)
Space Utilization 87% (vs 71% for Row)
Patterns Evaluated 10 deterministic patterns
Tournament Rounds 3 adaptive rounds
Test Suite 561+ tests, zero mocks
Pattern Generation under 10ms

MuJoCo Shake Simulation

Column Stack — Shake Test    Pinwheel — Shake Test

Left: Column Stack  |  Right: Pinwheel — MuJoCo pallet shake simulation


Demo Video

Demo Video

🎬 Watch on YouTube — 2:30 min demo of the full pipeline


Quick Start

# 1. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh && source ~/.bashrc

# 2. Clone & install
git clone https://github.com/doosan-robotics/palletizing-ai.git
cd palletizing-ai
uv python install 3.12      # required if your system has Python < 3.12
uv sync                     # base dependencies
uv sync --extra local       # + HuggingFace/PyTorch for local AI inference

# 3. HuggingFace login (Cosmos Reason2 is a gated model)
#    Get token: https://huggingface.co/settings/tokens
#    Request access: https://huggingface.co/nvidia/Cosmos-Reason2-8B
uv run huggingface-cli login --token YOUR_HF_TOKEN

# 4. Configure environment
cp .env.example .env
# Edit .env → set COSMOS_BACKEND=huggingface and HF_TOKEN=your_token

# 5. Launch
uv run python scripts/run_app.py

The dashboard runs without AI (simulation-only mode). AI features activate on first use (~16 GB download).

Tab Purpose
3D View Configure box/pallet, preview patterns, run simulation and AI pipeline
AI Analysis Pipeline progress, Reasoning VLM render gallery, tournament results, explanation
Comparison Side-by-side scoring of all 10 patterns

Key actions: Render (3D preview) · Run Simulation (shake & tilt test) · Run All + AI (full pipeline: Generate → Tournament → Explain)


Development

uv run pytest tests/ -v                 # all tests
uv run pytest tests/ -v -m "not slow"  # skip MuJoCo-heavy tests
uv run ruff check src/ tests/ scripts/
uv run ruff format src/ tests/ scripts/

Troubleshooting

Problem Solution
uv: command not found curl -LsSf https://astral.sh/uv/install.sh | sh && source ~/.bashrc
requires-python >=3.12 uv python install 3.12 (Ubuntu 22.04 default is 3.10)
401 Unauthorized loading model uv run huggingface-cli login + request access at the model page
ModuleNotFoundError: torch uv sync --extra local
CUDA out of memory Use Cosmos-Reason2-2B or set COSMOS_BASE_URL to a remote vLLM server
3D scene is black Enable hardware acceleration in Chrome/Firefox
macOS (Apple Silicon) Set PYTORCH_ENABLE_MPS_FALLBACK=1 in .env; AI requires Ollama or a remote vLLM server
No GPU or HuggingFace access AI backend auto-falls back to Ollama (localhost:11434). To use a remote server instead, set COSMOS_BASE_URL in .env

Team

Team Doosan Umanoide — Doosan Robotics

NVIDIA Cosmos Cookoff 2026


License

Apache 2.0

About

AI-Powered Optimal Palletizing with Physical Reasoning - NVIDIA Cosmos Reason2

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages