A Spatial Retrieval-Augmented Generation system for latent world models, designed for embodied spatial intelligence in robotics, autonomous navigation, and embodied AI.
This project implements a memory-augmented latent world model that:
- Encodes observations (RGB, depth, proprioception) into compact latent representations
- Stores latent states with spatial metadata in a vector database
- Retrieves relevant past experiences using hybrid spatial + latent similarity search
- Predicts future states by conditioning on retrieved memory context
Result: Improved prediction accuracy (15-30%) and sample efficiency for embodied agents.
๐ New to Spatial-RAG? See the Practical Usage Guide for real-world applications and examples.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SPATIAL-RAG SYSTEM OVERVIEW โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ PERCEPTION WORLD MODEL MEMORY โ
โ โโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโ โ
โ โ
โ ๐ท Camera โโโโโโโบ โโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ Encoder โ โโโบ z[32] โโโโโโบโ Qdrant โ โ
โ ๐ฎ Actions โโโโโโโบ โโโโโโฌโโโโโ โ โ (Vector โ โ
โ โ โ โ DB) โ โ
โ ๐ Pose โโโโโโโบ โผ โ โโโโโโฌโโโโโโ โ
โ โโโโโโโโโโโโ โ โ โ
โ โTransitionโโโโโโโโโดโโโโโโโโโโโโโโ โ
โ โโโโโโฌโโโโโโ Retrieved โ
โ โ Memories โ
โ โผ โ
โ โโโโโโโโโโโ โ
โ โ Decoder โ โโโบ ๐ผ๏ธ Predicted Frame โ
โ โโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโ โ
โ โ Policy โ โโโบ ๐ค Motor Commands โ
โ โโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Key Components:
Perception โ Encoder โ Latent z โ Memory Bank (Qdrant)
โ โ
Transition โ โ Retrieval Module
โ
Decoder โ Reconstruction
โ
Policy/Planner โ Actions
# Clone the repository
git clone <repo-url>
cd Spatial-RAG-Worldmodel
# Install dependencies (optional for local dev)
pip install -r requirements.txt# Build the shared base image
docker compose build base
# Start core services (Qdrant + API)
docker compose --profile core up -d
# Start UI dashboard
docker compose --profile ui up -dAccess:
- ๐ API: http://localhost:8080
- ๐ API Docs: http://localhost:8080/docs
- ๐ฅ๏ธ UI Dashboard: http://localhost:3000
- ๐ Qdrant: http://localhost:6333
# Generate synthetic data
docker compose run --rm generate-data python scripts/simulate_env.py --out data/trajectories --n 500
# Train models
docker compose run --rm train
# Restart API with trained model
docker compose restart api- Open http://localhost:3000
- Click "Generate Random Latent"
- Click "Start Rollout"
- Watch predicted frames stream in real-time!
| Application | Use Case |
|---|---|
| ๐ Autonomous Drones | Navigate cities using past flight memories |
| ๐ Self-Driving Cars | Predict pedestrian behavior at intersections |
| ๐ฆ Warehouse Robots | Remember item locations for faster picking |
| ๐ Home Assistants | Learn house layout, remember where things are |
| ๐ AR Navigation | Predictive overlays based on spatial memory |
๐ See Practical Usage Guide for detailed examples.
All services share optimized images (~25GB total):
| Service | Purpose |
|---|---|
api |
FastAPI inference server |
ui |
Next.js dashboard |
qdrant |
Vector database |
train |
Model training |
ros2 |
ROS2 robotics node |
generate-data |
Synthetic data |
collect |
Data collection |
# Start services
docker compose --profile core up -d # API + Qdrant
docker compose --profile ui up -d # Dashboard
docker compose --profile ros2 up -d # ROS2 node
# Run tasks
docker compose run --rm train # Train model
docker compose run --rm experiment # Run experiments
docker compose run --rm reports # Generate reports
# Manage
docker compose logs -f api # View logs
docker compose ps # Check status
docker compose down # Stop allReal-time robotics integration with ROS2 Humble. โ Fully tested and working!
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ROS2 DATA FLOW (@25Hz) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ ๐ท Camera โโโโบ webcam_bridge.py โโโโบ FastAPI โโโโบ ROS2 Bridge โ
โ โ โ โ โ
โ Capture Encode Publish โ
โ Frames to z[32] Topics โ
โ โ โ โ
โ โผ โผ โ
โ โโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ z_next โ โ /latent โ โ
โ โpredictedโ โ/latent_ โ โ
โ โโโโโโโโโโโ โ next โ โ
โ โโโโโโฌโโโโโโ โ
โ โ โ
โ ๐ฎ /actions โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ ๐ค Motors โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# Start all ROS2 services
docker compose --profile ros2 up -d
# Stream webcam with predictions to ROS2 (on Windows host)
python scripts/webcam_bridge.py --mode api --camera 0 --fps 5 --display --ros2-bridge http://localhost:8082| Topic | Direction | Rate | Description |
|---|---|---|---|
/latent |
Publish | ~25Hz | Current 32-dim latent state |
/latent_next |
Publish | ~25Hz | Predicted next latent |
/actions |
Subscribe | 5Hz+ | Action commands [x, y] |
/camera/image_raw |
Subscribe | - | Camera images |
# Publish actions (forward motion)
docker compose exec ros2-bridge bash -c "source /opt/ros/humble/setup.bash && \
ros2 topic pub /actions std_msgs/Float32MultiArray '{data: [1.0, 0.5]}' --rate 5"# Echo latent vector
docker compose exec ros2-bridge bash -c "source /opt/ros/humble/setup.bash && ros2 topic echo /latent --once"
# Echo prediction
docker compose exec ros2-bridge bash -c "source /opt/ros/humble/setup.bash && ros2 topic echo /latent_next --once"
# Check rates (~25Hz)
docker compose exec ros2-bridge bash -c "source /opt/ros/humble/setup.bash && ros2 topic hz /latent"Stream your laptop webcam to the API for real-time encoding:
# Install on Windows (NOT Docker)
pip install opencv-python requests pillow
# List cameras
python scripts/webcam_bridge.py --mode list
# Stream with live preview
python scripts/webcam_bridge.py --mode api --camera 0 --displayOutput:
Streaming to http://localhost:8080 at 5 FPS
Frame 100: latent mean=-0.0575, FPS=4.8
๐ See Webcam Streaming Guide for details.
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/encode |
POST | Image โ latent |
/webcam/encode |
POST | Base64 image โ latent (for webcam) |
/predict |
POST | One-step prediction |
/rollout |
POST | Multi-step rollout |
/stream-rollout |
GET | SSE streaming |
/retrieve |
POST | Memory search |
| Model | Latent MSE | Improvement |
|---|---|---|
| Baseline | 0.0234 | - |
| Spatial-RAG | 0.0198 | 15.4% |
| Feature | Status | Description |
|---|---|---|
| โ Latent Encoding | Done | Real-time camera โ 32-dim latent @ 25Hz |
| โ Next-State Prediction | Done | /latent_next predictions |
| โ Memory Retrieval | Done | Qdrant-based spatial memory |
| โ ROS2 Integration | Done | /latent, /actions topics |
| ๐ Policy Network | Planned | Neural net: latent โ motor commands |
| ๐ Robot Training Data | Planned | Collect from YOUR robot |
| ๐ Path Planning | Planned | A*/RRT goal navigation |
๐ See Robot Integration Guide for full autonomy details.
| Option | Price (PKR) | Inference | Best For |
|---|---|---|---|
| Pi 4 (4GB) | ~Rs 18,000 | ~50ms | Budget robots |
| Pi 5 (8GB) | ~Rs 28,000 | ~30ms | Faster autonomy |
| Jetson Orin | ~Rs 60,000+ | ~5ms | Production |
โ Pi Zero not recommended (too slow for real-time inference)
| Document | Description |
|---|---|
| ๐ Practical Guide | Real-world applications and examples |
| ๐ค Robot Integration | End-to-end robot setup guide |
| ๐ ๏ธ Build Guide | Shopping list + assembly instructions |
| ๐ท Webcam Streaming | Stream laptop camera to API |
| ๐๏ธ Design | Architecture and system design |
| ๐ Deployment | Production deployment guide |
| ๐ท Data Collection | Collecting robot data |
| ๐ Quick Reference | Command cheat sheet |
Spatial-RAG-Worldmodel/
โโโ api/ # FastAPI server
โโโ ui/ # Next.js dashboard
โโโ ros2_ws/ # ROS2 integration
โโโ src/ # Core library
โ โโโ models/ # Encoder, Transition, Decoder
โ โโโ memory/ # Qdrant, Faiss stores
โ โโโ datasets/ # Data loading
โโโ scripts/ # Training, export, collection
โโโ docs/ # Documentation
โโโ docker-compose.yml # Service orchestration
| Variable | Default | Description |
|---|---|---|
Z_DIM |
32 | Latent dimension |
ACTION_DIM |
2 | Action dimension |
TOPK |
8 | Retrieved memories |
QDRANT_HOST |
localhost | Qdrant host |
MIT License
@software{spatial_rag_worldmodel,
title={Spatial-RAG World Model for Embodied Spatial Intelligence},
author={Adnan Sattar},
year={2025},
url={https://github.com/adnansattar/Spatial-RAG-Worldmodel}
}

