Skip to content

A Spatial Retrieval-Augmented Generation system for latent world models, designed for embodied spatial intelligence in robotics, autonomous navigation, and embodied AI. Features ROS2 integration, real-time inference @ 25Hz, and complete robot build guide.

License

Notifications You must be signed in to change notification settings

AdnanSattar/Spatial-RAG-Worldmodel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Spatial-RAG World Model

A Spatial Retrieval-Augmented Generation system for latent world models, designed for embodied spatial intelligence in robotics, autonomous navigation, and embodied AI.

Spatial-RAG Dashboard

๐ŸŽฏ What is Spatial-RAG?

This project implements a memory-augmented latent world model that:

  • Encodes observations (RGB, depth, proprioception) into compact latent representations
  • Stores latent states with spatial metadata in a vector database
  • Retrieves relevant past experiences using hybrid spatial + latent similarity search
  • Predicts future states by conditioning on retrieved memory context

Result: Improved prediction accuracy (15-30%) and sample efficiency for embodied agents.

๐Ÿ“– New to Spatial-RAG? See the Practical Usage Guide for real-world applications and examples.

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         SPATIAL-RAG SYSTEM OVERVIEW                          โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                              โ”‚
โ”‚   PERCEPTION           WORLD MODEL                    MEMORY                 โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€           โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                โ”‚
โ”‚                                                                              โ”‚
โ”‚   ๐Ÿ“ท Camera  โ”€โ”€โ”€โ”€โ”€โ”€โ–บ  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”              โ”‚
โ”‚                       โ”‚ Encoder โ”‚ โ”€โ”€โ–บ z[32] โ”€โ”€โ”€โ”€โ”€โ–บโ”‚  Qdrant  โ”‚              โ”‚
โ”‚   ๐ŸŽฎ Actions โ”€โ”€โ”€โ”€โ”€โ”€โ–บ  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜        โ”‚        โ”‚ (Vector  โ”‚              โ”‚
โ”‚                            โ”‚             โ”‚        โ”‚   DB)    โ”‚              โ”‚
โ”‚   ๐Ÿ“ Pose    โ”€โ”€โ”€โ”€โ”€โ”€โ–บ       โ–ผ             โ”‚        โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜              โ”‚
โ”‚                       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚             โ”‚                    โ”‚
โ”‚                       โ”‚Transitionโ”‚โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                    โ”‚
โ”‚                       โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜     Retrieved                            โ”‚
โ”‚                            โ”‚           Memories                             โ”‚
โ”‚                            โ–ผ                                                 โ”‚
โ”‚                       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                           โ”‚
โ”‚                       โ”‚ Decoder โ”‚ โ”€โ”€โ–บ ๐Ÿ–ผ๏ธ Predicted Frame                    โ”‚
โ”‚                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                           โ”‚
โ”‚                            โ”‚                                                 โ”‚
โ”‚                            โ–ผ                                                 โ”‚
โ”‚                       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                           โ”‚
โ”‚                       โ”‚ Policy  โ”‚ โ”€โ”€โ–บ ๐Ÿค– Motor Commands                     โ”‚
โ”‚                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                           โ”‚
โ”‚                                                                              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Components:

Perception โ†’ Encoder โ†’ Latent z โ†’ Memory Bank (Qdrant)
                          โ†“              โ†“
                   Transition โ† โ† Retrieval Module
                          โ†“
                   Decoder โ†’ Reconstruction
                          โ†“
                   Policy/Planner โ†’ Actions

๐Ÿš€ Quick Start

1. Installation

# Clone the repository
git clone <repo-url>
cd Spatial-RAG-Worldmodel

# Install dependencies (optional for local dev)
pip install -r requirements.txt

2. Docker Setup (Recommended)

# Build the shared base image
docker compose build base

# Start core services (Qdrant + API)
docker compose --profile core up -d

# Start UI dashboard
docker compose --profile ui up -d

Docker Services Running

Access:

FastAPI Swagger Docs

3. Generate Data & Train

# Generate synthetic data
docker compose run --rm generate-data python scripts/simulate_env.py --out data/trajectories --n 500

# Train models
docker compose run --rm train

# Restart API with trained model
docker compose restart api

4. Test in UI

  1. Open http://localhost:3000
  2. Click "Generate Random Latent"
  3. Click "Start Rollout"
  4. Watch predicted frames stream in real-time!

๐Ÿค– Real-World Applications

Application Use Case
๐Ÿš Autonomous Drones Navigate cities using past flight memories
๐Ÿš— Self-Driving Cars Predict pedestrian behavior at intersections
๐Ÿ“ฆ Warehouse Robots Remember item locations for faster picking
๐Ÿ  Home Assistants Learn house layout, remember where things are
๐Ÿ‘“ AR Navigation Predictive overlays based on spatial memory

๐Ÿ“– See Practical Usage Guide for detailed examples.

๐Ÿณ Docker Services

All services share optimized images (~25GB total):

Service Purpose
api FastAPI inference server
ui Next.js dashboard
qdrant Vector database
train Model training
ros2 ROS2 robotics node
generate-data Synthetic data
collect Data collection

Common Commands

# Start services
docker compose --profile core up -d           # API + Qdrant
docker compose --profile ui up -d             # Dashboard
docker compose --profile ros2 up -d           # ROS2 node

# Run tasks
docker compose run --rm train                 # Train model
docker compose run --rm experiment            # Run experiments
docker compose run --rm reports               # Generate reports

# Manage
docker compose logs -f api                    # View logs
docker compose ps                             # Check status
docker compose down                           # Stop all

๐Ÿค– ROS2 Integration

Real-time robotics integration with ROS2 Humble. โœ… Fully tested and working!

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         ROS2 DATA FLOW (@25Hz)                               โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                              โ”‚
โ”‚  ๐Ÿ“ท Camera โ”€โ”€โ”€โ–บ webcam_bridge.py โ”€โ”€โ”€โ–บ FastAPI โ”€โ”€โ”€โ–บ ROS2 Bridge             โ”‚
โ”‚                      โ”‚                   โ”‚              โ”‚                   โ”‚
โ”‚                   Capture             Encode         Publish                โ”‚
โ”‚                   Frames            to z[32]        Topics                  โ”‚
โ”‚                                         โ”‚              โ”‚                    โ”‚
โ”‚                                         โ–ผ              โ–ผ                    โ”‚
โ”‚                                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”              โ”‚
โ”‚                                    โ”‚ z_next  โ”‚   โ”‚ /latent  โ”‚              โ”‚
โ”‚                                    โ”‚predictedโ”‚   โ”‚/latent_  โ”‚              โ”‚
โ”‚                                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚  next    โ”‚              โ”‚
โ”‚                                                  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜              โ”‚
โ”‚                                                       โ”‚                     โ”‚
โ”‚  ๐ŸŽฎ /actions โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                     โ”‚
โ”‚       โ”‚                                                                     โ”‚
โ”‚       โ–ผ                                                                     โ”‚
โ”‚  ๐Ÿค– Motors                                                                  โ”‚
โ”‚                                                                              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Quick Start

# Start all ROS2 services
docker compose --profile ros2 up -d

# Stream webcam with predictions to ROS2 (on Windows host)
python scripts/webcam_bridge.py --mode api --camera 0 --fps 5 --display --ros2-bridge http://localhost:8082

Topics

Topic Direction Rate Description
/latent Publish ~25Hz Current 32-dim latent state
/latent_next Publish ~25Hz Predicted next latent
/actions Subscribe 5Hz+ Action commands [x, y]
/camera/image_raw Subscribe - Camera images

Send Action Commands

# Publish actions (forward motion)
docker compose exec ros2-bridge bash -c "source /opt/ros/humble/setup.bash && \
  ros2 topic pub /actions std_msgs/Float32MultiArray '{data: [1.0, 0.5]}' --rate 5"

Monitor Topics

# Echo latent vector
docker compose exec ros2-bridge bash -c "source /opt/ros/humble/setup.bash && ros2 topic echo /latent --once"

# Echo prediction
docker compose exec ros2-bridge bash -c "source /opt/ros/humble/setup.bash && ros2 topic echo /latent_next --once"

# Check rates (~25Hz)
docker compose exec ros2-bridge bash -c "source /opt/ros/humble/setup.bash && ros2 topic hz /latent"

๐Ÿ“ท Webcam Streaming (Windows)

Stream your laptop webcam to the API for real-time encoding:

# Install on Windows (NOT Docker)
pip install opencv-python requests pillow

# List cameras
python scripts/webcam_bridge.py --mode list

# Stream with live preview
python scripts/webcam_bridge.py --mode api --camera 0 --display

Output:

Streaming to http://localhost:8080 at 5 FPS
Frame 100: latent mean=-0.0575, FPS=4.8

๐Ÿ“– See Webcam Streaming Guide for details.

๐Ÿ“Š API Endpoints

Endpoint Method Description
/health GET Health check
/encode POST Image โ†’ latent
/webcam/encode POST Base64 image โ†’ latent (for webcam)
/predict POST One-step prediction
/rollout POST Multi-step rollout
/stream-rollout GET SSE streaming
/retrieve POST Memory search

๐Ÿ“ˆ Results

Model Latent MSE Improvement
Baseline 0.0234 -
Spatial-RAG 0.0198 15.4%

๐Ÿš€ Roadmap: Full Autonomy

Feature Status Description
โœ… Latent Encoding Done Real-time camera โ†’ 32-dim latent @ 25Hz
โœ… Next-State Prediction Done /latent_next predictions
โœ… Memory Retrieval Done Qdrant-based spatial memory
โœ… ROS2 Integration Done /latent, /actions topics
๐Ÿ”œ Policy Network Planned Neural net: latent โ†’ motor commands
๐Ÿ”œ Robot Training Data Planned Collect from YOUR robot
๐Ÿ”œ Path Planning Planned A*/RRT goal navigation

๐Ÿ“– See Robot Integration Guide for full autonomy details.

Recommended Hardware

Option Price (PKR) Inference Best For
Pi 4 (4GB) ~Rs 18,000 ~50ms Budget robots
Pi 5 (8GB) ~Rs 28,000 ~30ms Faster autonomy
Jetson Orin ~Rs 60,000+ ~5ms Production

โŒ Pi Zero not recommended (too slow for real-time inference)

๐Ÿ“š Documentation

Document Description
๐Ÿ“– Practical Guide Real-world applications and examples
๐Ÿค– Robot Integration End-to-end robot setup guide
๐Ÿ› ๏ธ Build Guide Shopping list + assembly instructions
๐Ÿ“ท Webcam Streaming Stream laptop camera to API
๐Ÿ—๏ธ Design Architecture and system design
๐Ÿš€ Deployment Production deployment guide
๐Ÿ“ท Data Collection Collecting robot data
๐Ÿ“‹ Quick Reference Command cheat sheet

๐Ÿ“ Project Structure

Spatial-RAG-Worldmodel/
โ”œโ”€โ”€ api/                    # FastAPI server
โ”œโ”€โ”€ ui/                     # Next.js dashboard
โ”œโ”€โ”€ ros2_ws/                # ROS2 integration
โ”œโ”€โ”€ src/                    # Core library
โ”‚   โ”œโ”€โ”€ models/             # Encoder, Transition, Decoder
โ”‚   โ”œโ”€โ”€ memory/             # Qdrant, Faiss stores
โ”‚   โ””โ”€โ”€ datasets/           # Data loading
โ”œโ”€โ”€ scripts/                # Training, export, collection
โ”œโ”€โ”€ docs/                   # Documentation
โ””โ”€โ”€ docker-compose.yml      # Service orchestration

โš™๏ธ Configuration

Variable Default Description
Z_DIM 32 Latent dimension
ACTION_DIM 2 Action dimension
TOPK 8 Retrieved memories
QDRANT_HOST localhost Qdrant host

๐Ÿ“„ License

MIT License

๐Ÿ“ Citation

@software{spatial_rag_worldmodel,
  title={Spatial-RAG World Model for Embodied Spatial Intelligence},
  author={Adnan Sattar},
  year={2025},
  url={https://github.com/adnansattar/Spatial-RAG-Worldmodel}
}

๐Ÿ“š References

About

A Spatial Retrieval-Augmented Generation system for latent world models, designed for embodied spatial intelligence in robotics, autonomous navigation, and embodied AI. Features ROS2 integration, real-time inference @ 25Hz, and complete robot build guide.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published