Physics-Aware Motion Prediction using Differentiable Kinematics

Predicting the future trajectory of vehicles on real-world roads by combining deep perception with a differentiable physics engine.

graph TD
    A[HD Map & History] -->|Rasterize| B(Bev Image 224x224)
    B -->|ResNet-18| C{Network Head}
    
    subgraph Phase 2
    C -->|Regression Head| D[Output: Unconstrained Trajectory]
    end

    subgraph Phase 3
    C -->|Control Head| F[a, steer]
    F -->|Bicycle Model| G[Output: Physics-Feasible Trajectory]
    end

    style F fill:#f9f,stroke:#333,stroke-width:2px
    style G fill:#4caf50,stroke:#333,stroke-width:2px,color:#fff

Overview

Given 10 past observations (1 second at 10 Hz) and a High-Definition map from the Waymo Open Motion Dataset, this system predicts the next 30 timesteps (3 seconds) of a vehicle's trajectory.

The core contribution is a Differentiable Kinematic Bicycle Model - a physics simulation layer written entirely in PyTorch. Instead of asking the network to regress raw $(x, y)$ waypoints, we ask it to predict control inputs (acceleration $a$ and steering angle $\delta$). These controls are integrated through the kinematic equations to produce a trajectory that is guaranteed to be physically feasible: no side-slipping, no teleportation, no jitter.

Motivation - Why Physics?

Standard deep-learning approaches treat motion prediction as unconstrained regression: the network outputs 30 $(x, y)$ pairs and a loss function penalises deviation from the ground truth. This works surprisingly well on average but produces physically impossible predictions in the long tail:

Failure Mode	Cause
Lateral teleportation	The network snaps to the lane centre between timesteps
Jittery trajectories	Small per-step errors compound into a jagged path
Crab-walking	The predicted motion vector is perpendicular to the heading

All of these artefacts violate basic vehicle kinematics. By routing the prediction through a physics engine, these failure modes become structurally impossible.

Architecture

This project evolved through three distinct phases, each addressing a limitation of the previous one.

Phase 1: LSTM Baseline ("The Blind Model")

Past Velocities (10×2) → LSTM(64) → Linear → Future Velocities (30×2)

A vanilla LSTM encoder–decoder that consumes past velocity vectors $(\Delta x, \Delta y)$ and regresses future velocity vectors. No map or context is provided.

Result: The model learns basic momentum - it extrapolates the current velocity forward. On straight roads this is reasonable, but at intersections or curves the car drives straight through walls because it has no concept of road geometry.

Phase 2: Visual CNN ("The Seeing Model")

Rasterised BEV Image (3×224×224) → ResNet-18 → Linear → Future Velocities (30×2)

We introduce a custom rasterizer that renders the HD map, ego history, and neighbouring agents into a 224×224 bird's-eye view image. This image is fed into a pretrained ResNet-18 backbone with a regression head.

Result: The model can now see the road and learns to follow lane geometry. However, it suffers from mode collapse (averaging over multiple plausible futures) and trajectory jitter because it still treats each $(x, y)$ prediction independently with no physics coupling between steps.

Phase 3: Differentiable Physics ("The Driving Model")

Rasterised BEV Image (3×224×224) → ResNet-18 → Control Inputs (30×2) → Bicycle Model (Differentiable) → Trajectory (30×2)

The network architecture is identical to Phase 2, but the output semantics fundamentally change. Instead of predicting positions or velocities, the network predicts physical control signals: longitudinal acceleration $a_t$ and front-wheel steering angle $\delta_t$ for each of the 30 future steps.

These controls are clamped to feasible ranges:

$$ |a| \le 4\text{ m/s}^2, \quad |\delta| \le 0.5\text{ rad} $$

and integrated through a Kinematic Bicycle Model:

$$v_{t+1} = v_t + a_t \cdot \Delta t$$

$$\psi_{t+1} = \psi_t + \frac{v_t \cdot \tan(\delta_t)}{L} \cdot \Delta t$$

$$x_{t+1} = x_t + v_t \cdot \cos(\psi_t) \cdot \Delta t$$

$$y_{t+1} = y_t + v_t \cdot \sin(\psi_t) \cdot \Delta t$$

where $L = 2.5;\text{m}$ is the wheelbase and $\Delta t = 0.1;\text{s}$.

The loss is computed on the final trajectory $(x, y)$, but gradients flow back through the physics equations into the network weights. This is the key insight: the neural network learns to produce controls that, when executed by a kinematic simulator, match the observed ground-truth motion.

Result: Smooth, physically feasible arcs. The vehicle cannot crab-walk, teleport, or produce discontinuous headings.

How It Works - The Full Pipeline

1. HD-Map Rasterization

The raw Waymo scenario protobuf contains polyline geometry for lane centres, road edges, and stop signs, as well as tracked object states at 10 Hz. The Rasterizer converts this into a 224×224 RGB image centred on the target vehicle:

Channel	Colour	Content
R	Red	Lane centre-lines and road edges
G	Green	Ego past trajectory + current position
B	Blue	Neighbouring vehicles at the current timestep

Each pixel represents 0.5 m, giving approximately 56 m of visibility in every direction.

This rasterised representation allows us to leverage powerful pretrained image backbones (ResNet, EfficientNet, etc.) without designing a custom map encoder.

2. ResNet-18 Backbone

The rasterised image is passed through a standard ResNet-18 whose final fully-connected layer is replaced with a linear projection to $30 \times 2 = 60$ output dimensions.

In the physics-aware variant, these 60 values represent accelerations and steering angles rather than positions:

fc_out (60) → reshape (30, 2) → tanh clamping → [accel, steer]

3. Differentiable Bicycle Model

The clamped controls are fed step-by-step into the bicycle_model() function (see src/physics/bicycle_model.py). Because every operation - addition, multiplication, torch.tan, torch.cos, torch.sin - is a native PyTorch op, the entire rollout is auto-differentiable.

The training loop is therefore:

Forward: Image → ResNet → controls → bicycle rollout → predicted trajectory
Loss: Huber loss between predicted and ground-truth $(x, y)$ positions
Backward: Gradients flow through the physics equations, teaching the network which controls produce trajectories that match reality
Clipping: Gradient norms are clipped at 1.0 to prevent exploding gradients from the recurrent rollout

Results

Qualitative Comparison

Standard CNN (Phase 2)	Physics-Aware (Phase 3)

Jagged trajectory with lateral jitter	Smooth, kinematically feasible arcs
May "teleport" between lane centres	Continuous heading evolution

Key Observations

Smoothness: The bicycle model enforces $C^1$ continuity - heading changes are always proportional to speed and steering angle.
Feasibility: Clamping acceleration to $\pm4;\text{m/s}^2$ and steering to $\pm0.5;\text{rad}$ eliminates physically impossible predictions.
Training stability: The Huber loss combined with gradient clipping prevents the recurrent rollout from producing NaN gradients.

Project Structure

physics-aware-motion-prediction/
├── src/
│   ├── models/
│   │   ├── lstm_baseline.py       # Phase 1 — LSTM encoder–decoder
│   │   ├── visual_cnn.py          # Phase 2 — ResNet-18 regression head
│   │   ├── physics_aware.py       # Phase 3 — ResNet-18 + bicycle rollout
│   │   └── multimodal.py          # Multi-modal variant (K hypotheses + WTA loss)
│   ├── physics/
│   │   └── bicycle_model.py       # Differentiable kinematic bicycle model
│   └── data/
│       └── rasterizer.py          # HD-Map → 224×224 BEV rasteriser
│
├── scripts/
│   ├── train/                     # Training entry-points per phase
│   └── evaluate/                  # Evaluation, visualisation, failure analysis
│
├── train_model.py                 # Phase 1 training (LSTM)
├── train_visual.py                # Phase 2 training (ResNet-18 CNN)
├── train_physics.py               # Phase 3 training (Physics-Aware)
├── train_multimodal.py            # Multi-modal training (Winner-Takes-All)
├── test_physics.py                # Physics model evaluation + plotting
├── test_visual.py                 # Visual model evaluation + plotting
├── test_multimodal.py             # Multi-modal evaluation + plotting
├── evaluate_model.py              # ADE metric computation (LSTM)
├── find_failure.py                # Worst-case failure analysis
├── visualize.py                   # Raw Waymo scene visualisation
│
├── bicycle.py                     # Original bicycle model (kept for compatibility)
├── rasterizer.py                  # Original rasterizer (kept for compatibility)
├── assets/                        # Result images for documentation
├── pyproject.toml                 # Project metadata & dependencies
├── .gitignore
└── README.md

Getting Started

Prerequisites

Python ≥ 3.10
CUDA-capable GPU (recommended; CPU training is possible but slow)
Access to the Waymo Open Motion Dataset v1.3.1

Installation

This project is managed with uv.

# Clone the repository
git clone https://github.com/Ismail-Dagli/physics-aware-motion-prediction.git
cd physics-aware-motion-prediction

# Sync dependencies (creates .venv automatically)
uv sync

Data

This project streams data directly from Google Cloud Storage. Create a files.txt listing the TFRecord shards:

gsutil ls gs://waymo_open_dataset_motion_v_1_3_1/uncompressed/scenario/training_20s/ > files.txt

Note: You need an authenticated gcloud session with access to the Waymo Open Dataset bucket.

Training

You can run scripts using uv run:

# Phase 1 — LSTM baseline
uv run scripts/train/train_model.py

# Phase 2 — Visual CNN
uv run scripts/train/train_visual.py

# Phase 3 — Physics-Aware (recommended)
uv run scripts/train/train_physics.py

Training resumes automatically from the last checkpoint if a .pth file is found in checkpoints/.

Evaluation

# Quantitative ADE metric
uv run scripts/evaluate/evaluate_model.py

# Visual inspection of physics model
uv run scripts/evaluate/test_physics.py

# Find worst-case failure
uv run scripts/evaluate/find_failure.py

Dependencies

Package	Purpose
PyTorch	Neural network training, differentiable physics layer
torchvision	ResNet-18 pretrained backbone
TensorFlow	Reading Waymo TFRecord files via `tf.data`
OpenCV	Rasterisation — drawing lanes, edges, and agents
NumPy	Array operations and data preprocessing
Matplotlib	Result visualisation and plotting
Waymo Open Dataset	Protobuf definitions for scenario parsing

License

This project is released under the MIT License.

The Waymo Open Motion Dataset is subject to the Waymo Dataset License Agreement.

Acknowledgements

Waymo Open Dataset for providing the motion prediction benchmark and HD-map data.
The kinematic bicycle model formulation follows Rajamani, R. Vehicle Dynamics and Control (Springer, 2011).
ResNet architecture from He et al., Deep Residual Learning for Image Recognition (CVPR 2016).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Physics-Aware Motion Prediction using Differentiable Kinematics

Table of Contents

Overview

Motivation - Why Physics?

Architecture

Phase 1: LSTM Baseline ("The Blind Model")

Phase 2: Visual CNN ("The Seeing Model")

Phase 3: Differentiable Physics ("The Driving Model")

How It Works - The Full Pipeline

1. HD-Map Rasterization

2. ResNet-18 Backbone

3. Differentiable Bicycle Model

Results

Qualitative Comparison

Key Observations

Project Structure

Getting Started

Prerequisites

Installation

Data

Training

Evaluation

Dependencies

License

Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
scripts		scripts
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Physics-Aware Motion Prediction using Differentiable Kinematics

Table of Contents

Overview

Motivation - Why Physics?

Architecture

Phase 1: LSTM Baseline ("The Blind Model")

Phase 2: Visual CNN ("The Seeing Model")

Phase 3: Differentiable Physics ("The Driving Model")

How It Works - The Full Pipeline

1. HD-Map Rasterization

2. ResNet-18 Backbone

3. Differentiable Bicycle Model

Results

Qualitative Comparison

Key Observations

Project Structure

Getting Started

Prerequisites

Installation

Data

Training

Evaluation

Dependencies

License

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages