Skip to content

Ismail-Dagli/physics-aware-motion-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Physics-Aware Motion Prediction using Differentiable Kinematics

Predicting the future trajectory of vehicles on real-world roads by combining deep perception with a differentiable physics engine.

graph TD
    A[HD Map & History] -->|Rasterize| B(Bev Image 224x224)
    B -->|ResNet-18| C{Network Head}
    
    subgraph Phase 2
    C -->|Regression Head| D[Output: Unconstrained Trajectory]
    end

    subgraph Phase 3
    C -->|Control Head| F[a, steer]
    F -->|Bicycle Model| G[Output: Physics-Feasible Trajectory]
    end

    style F fill:#f9f,stroke:#333,stroke-width:2px
    style G fill:#4caf50,stroke:#333,stroke-width:2px,color:#fff
Loading

Table of Contents


Overview

Given 10 past observations (1 second at 10 Hz) and a High-Definition map from the Waymo Open Motion Dataset, this system predicts the next 30 timesteps (3 seconds) of a vehicle's trajectory.

The core contribution is a Differentiable Kinematic Bicycle Model - a physics simulation layer written entirely in PyTorch. Instead of asking the network to regress raw $(x, y)$ waypoints, we ask it to predict control inputs (acceleration $a$ and steering angle $\delta$). These controls are integrated through the kinematic equations to produce a trajectory that is guaranteed to be physically feasible: no side-slipping, no teleportation, no jitter.


Motivation - Why Physics?

Standard deep-learning approaches treat motion prediction as unconstrained regression: the network outputs 30 $(x, y)$ pairs and a loss function penalises deviation from the ground truth. This works surprisingly well on average but produces physically impossible predictions in the long tail:

Failure Mode Cause
Lateral teleportation The network snaps to the lane centre between timesteps
Jittery trajectories Small per-step errors compound into a jagged path
Crab-walking The predicted motion vector is perpendicular to the heading

All of these artefacts violate basic vehicle kinematics. By routing the prediction through a physics engine, these failure modes become structurally impossible.


Architecture

This project evolved through three distinct phases, each addressing a limitation of the previous one.

Phase 1: LSTM Baseline ("The Blind Model")

Past Velocities (10×2) → LSTM(64) → Linear → Future Velocities (30×2)

A vanilla LSTM encoder–decoder that consumes past velocity vectors $(\Delta x, \Delta y)$ and regresses future velocity vectors. No map or context is provided.

Result: The model learns basic momentum - it extrapolates the current velocity forward. On straight roads this is reasonable, but at intersections or curves the car drives straight through walls because it has no concept of road geometry.

Phase 2: Visual CNN ("The Seeing Model")

Rasterised BEV Image (3×224×224) → ResNet-18 → Linear → Future Velocities (30×2)

We introduce a custom rasterizer that renders the HD map, ego history, and neighbouring agents into a 224×224 bird's-eye view image. This image is fed into a pretrained ResNet-18 backbone with a regression head.

Result: The model can now see the road and learns to follow lane geometry. However, it suffers from mode collapse (averaging over multiple plausible futures) and trajectory jitter because it still treats each $(x, y)$ prediction independently with no physics coupling between steps.

Phase 3: Differentiable Physics ("The Driving Model")

Rasterised BEV Image (3×224×224) → ResNet-18 → Control Inputs (30×2) → Bicycle Model (Differentiable) → Trajectory (30×2)

The network architecture is identical to Phase 2, but the output semantics fundamentally change. Instead of predicting positions or velocities, the network predicts physical control signals: longitudinal acceleration $a_t$ and front-wheel steering angle $\delta_t$ for each of the 30 future steps.

These controls are clamped to feasible ranges:

$$ |a| \le 4\text{ m/s}^2, \quad |\delta| \le 0.5\text{ rad} $$

and integrated through a Kinematic Bicycle Model:

$$v_{t+1} = v_t + a_t \cdot \Delta t$$

$$\psi_{t+1} = \psi_t + \frac{v_t \cdot \tan(\delta_t)}{L} \cdot \Delta t$$

$$x_{t+1} = x_t + v_t \cdot \cos(\psi_t) \cdot \Delta t$$

$$y_{t+1} = y_t + v_t \cdot \sin(\psi_t) \cdot \Delta t$$

where $L = 2.5;\text{m}$ is the wheelbase and $\Delta t = 0.1;\text{s}$.

The loss is computed on the final trajectory $(x, y)$, but gradients flow back through the physics equations into the network weights. This is the key insight: the neural network learns to produce controls that, when executed by a kinematic simulator, match the observed ground-truth motion.

Result: Smooth, physically feasible arcs. The vehicle cannot crab-walk, teleport, or produce discontinuous headings.


How It Works - The Full Pipeline

1. HD-Map Rasterization

The raw Waymo scenario protobuf contains polyline geometry for lane centres, road edges, and stop signs, as well as tracked object states at 10 Hz. The Rasterizer converts this into a 224×224 RGB image centred on the target vehicle:

Channel Colour Content
R Red Lane centre-lines and road edges
G Green Ego past trajectory + current position
B Blue Neighbouring vehicles at the current timestep

Each pixel represents 0.5 m, giving approximately 56 m of visibility in every direction.

Rasterised Agent View

This rasterised representation allows us to leverage powerful pretrained image backbones (ResNet, EfficientNet, etc.) without designing a custom map encoder.

2. ResNet-18 Backbone

The rasterised image is passed through a standard ResNet-18 whose final fully-connected layer is replaced with a linear projection to $30 \times 2 = 60$ output dimensions.

In the physics-aware variant, these 60 values represent accelerations and steering angles rather than positions:

fc_out (60) → reshape (30, 2) → tanh clamping → [accel, steer]

3. Differentiable Bicycle Model

The clamped controls are fed step-by-step into the bicycle_model() function (see src/physics/bicycle_model.py). Because every operation - addition, multiplication, torch.tan, torch.cos, torch.sin - is a native PyTorch op, the entire rollout is auto-differentiable.

The training loop is therefore:

  1. Forward: Image → ResNet → controls → bicycle rollout → predicted trajectory
  2. Loss: Huber loss between predicted and ground-truth $(x, y)$ positions
  3. Backward: Gradients flow through the physics equations, teaching the network which controls produce trajectories that match reality
  4. Clipping: Gradient norms are clipped at 1.0 to prevent exploding gradients from the recurrent rollout

Results

Qualitative Comparison

Standard CNN (Phase 2) Physics-Aware (Phase 3)
CNN Prediction Physics Prediction
Jagged trajectory with lateral jitter Smooth, kinematically feasible arcs
May "teleport" between lane centres Continuous heading evolution

Key Observations

  • Smoothness: The bicycle model enforces $C^1$ continuity - heading changes are always proportional to speed and steering angle.
  • Feasibility: Clamping acceleration to $\pm4;\text{m/s}^2$ and steering to $\pm0.5;\text{rad}$ eliminates physically impossible predictions.
  • Training stability: The Huber loss combined with gradient clipping prevents the recurrent rollout from producing NaN gradients.

Project Structure

physics-aware-motion-prediction/
├── src/
│   ├── models/
│   │   ├── lstm_baseline.py       # Phase 1 — LSTM encoder–decoder
│   │   ├── visual_cnn.py          # Phase 2 — ResNet-18 regression head
│   │   ├── physics_aware.py       # Phase 3 — ResNet-18 + bicycle rollout
│   │   └── multimodal.py          # Multi-modal variant (K hypotheses + WTA loss)
│   ├── physics/
│   │   └── bicycle_model.py       # Differentiable kinematic bicycle model
│   └── data/
│       └── rasterizer.py          # HD-Map → 224×224 BEV rasteriser
│
├── scripts/
│   ├── train/                     # Training entry-points per phase
│   └── evaluate/                  # Evaluation, visualisation, failure analysis
│
├── train_model.py                 # Phase 1 training (LSTM)
├── train_visual.py                # Phase 2 training (ResNet-18 CNN)
├── train_physics.py               # Phase 3 training (Physics-Aware)
├── train_multimodal.py            # Multi-modal training (Winner-Takes-All)
├── test_physics.py                # Physics model evaluation + plotting
├── test_visual.py                 # Visual model evaluation + plotting
├── test_multimodal.py             # Multi-modal evaluation + plotting
├── evaluate_model.py              # ADE metric computation (LSTM)
├── find_failure.py                # Worst-case failure analysis
├── visualize.py                   # Raw Waymo scene visualisation
│
├── bicycle.py                     # Original bicycle model (kept for compatibility)
├── rasterizer.py                  # Original rasterizer (kept for compatibility)
├── assets/                        # Result images for documentation
├── pyproject.toml                 # Project metadata & dependencies
├── .gitignore
└── README.md

Getting Started

Prerequisites

Installation

This project is managed with uv.

# Clone the repository
git clone https://github.com/Ismail-Dagli/physics-aware-motion-prediction.git
cd physics-aware-motion-prediction

# Sync dependencies (creates .venv automatically)
uv sync

Data

This project streams data directly from Google Cloud Storage. Create a files.txt listing the TFRecord shards:

gsutil ls gs://waymo_open_dataset_motion_v_1_3_1/uncompressed/scenario/training_20s/ > files.txt

Note: You need an authenticated gcloud session with access to the Waymo Open Dataset bucket.

Training

You can run scripts using uv run:

# Phase 1 — LSTM baseline
uv run scripts/train/train_model.py

# Phase 2 — Visual CNN
uv run scripts/train/train_visual.py

# Phase 3 — Physics-Aware (recommended)
uv run scripts/train/train_physics.py

Training resumes automatically from the last checkpoint if a .pth file is found in checkpoints/.

Evaluation

# Quantitative ADE metric
uv run scripts/evaluate/evaluate_model.py

# Visual inspection of physics model
uv run scripts/evaluate/test_physics.py

# Find worst-case failure
uv run scripts/evaluate/find_failure.py

Dependencies

Package Purpose
PyTorch Neural network training, differentiable physics layer
torchvision ResNet-18 pretrained backbone
TensorFlow Reading Waymo TFRecord files via tf.data
OpenCV Rasterisation — drawing lanes, edges, and agents
NumPy Array operations and data preprocessing
Matplotlib Result visualisation and plotting
Waymo Open Dataset Protobuf definitions for scenario parsing

License

This project is released under the MIT License.

The Waymo Open Motion Dataset is subject to the Waymo Dataset License Agreement.


Acknowledgements

About

Physics-aware motion prediction on the Waymo Open Dataset using a differentiable kinematic bicycle model in PyTorch.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages