Predicting the future trajectory of vehicles on real-world roads by combining deep perception with a differentiable physics engine.
graph TD
A[HD Map & History] -->|Rasterize| B(Bev Image 224x224)
B -->|ResNet-18| C{Network Head}
subgraph Phase 2
C -->|Regression Head| D[Output: Unconstrained Trajectory]
end
subgraph Phase 3
C -->|Control Head| F[a, steer]
F -->|Bicycle Model| G[Output: Physics-Feasible Trajectory]
end
style F fill:#f9f,stroke:#333,stroke-width:2px
style G fill:#4caf50,stroke:#333,stroke-width:2px,color:#fff
- Overview
- Motivation - Why Physics?
- Architecture
- How It Works - The Full Pipeline
- Results
- Project Structure
- Getting Started
- Dependencies
- License
- Acknowledgements
Given 10 past observations (1 second at 10 Hz) and a High-Definition map from the Waymo Open Motion Dataset, this system predicts the next 30 timesteps (3 seconds) of a vehicle's trajectory.
The core contribution is a Differentiable Kinematic Bicycle Model - a physics simulation layer written entirely in PyTorch. Instead of asking the network to regress raw
Standard deep-learning approaches treat motion prediction as unconstrained regression: the network outputs 30
| Failure Mode | Cause |
|---|---|
| Lateral teleportation | The network snaps to the lane centre between timesteps |
| Jittery trajectories | Small per-step errors compound into a jagged path |
| Crab-walking | The predicted motion vector is perpendicular to the heading |
All of these artefacts violate basic vehicle kinematics. By routing the prediction through a physics engine, these failure modes become structurally impossible.
This project evolved through three distinct phases, each addressing a limitation of the previous one.
Past Velocities (10×2) → LSTM(64) → Linear → Future Velocities (30×2)
A vanilla LSTM encoder–decoder that consumes past velocity vectors
Result: The model learns basic momentum - it extrapolates the current velocity forward. On straight roads this is reasonable, but at intersections or curves the car drives straight through walls because it has no concept of road geometry.
Rasterised BEV Image (3×224×224) → ResNet-18 → Linear → Future Velocities (30×2)
We introduce a custom rasterizer that renders the HD map, ego history, and neighbouring agents into a 224×224 bird's-eye view image. This image is fed into a pretrained ResNet-18 backbone with a regression head.
Result: The model can now see the road and learns to follow lane geometry. However, it suffers from mode collapse (averaging over multiple plausible futures) and trajectory jitter because it still treats each
Rasterised BEV Image (3×224×224) → ResNet-18 → Control Inputs (30×2) → Bicycle Model (Differentiable) → Trajectory (30×2)
The network architecture is identical to Phase 2, but the output semantics fundamentally change. Instead of predicting positions or velocities, the network predicts physical control signals: longitudinal acceleration
These controls are clamped to feasible ranges:
and integrated through a Kinematic Bicycle Model:
where
The loss is computed on the final trajectory
Result: Smooth, physically feasible arcs. The vehicle cannot crab-walk, teleport, or produce discontinuous headings.
The raw Waymo scenario protobuf contains polyline geometry for lane centres, road edges, and stop signs, as well as tracked object states at 10 Hz. The Rasterizer converts this into a 224×224 RGB image centred on the target vehicle:
| Channel | Colour | Content |
|---|---|---|
| R | Red | Lane centre-lines and road edges |
| G | Green | Ego past trajectory + current position |
| B | Blue | Neighbouring vehicles at the current timestep |
Each pixel represents 0.5 m, giving approximately 56 m of visibility in every direction.
This rasterised representation allows us to leverage powerful pretrained image backbones (ResNet, EfficientNet, etc.) without designing a custom map encoder.
The rasterised image is passed through a standard ResNet-18 whose final fully-connected layer is replaced with a linear projection to
In the physics-aware variant, these 60 values represent accelerations and steering angles rather than positions:
fc_out (60) → reshape (30, 2) → tanh clamping → [accel, steer]
The clamped controls are fed step-by-step into the bicycle_model() function (see src/physics/bicycle_model.py). Because every operation - addition, multiplication, torch.tan, torch.cos, torch.sin - is a native PyTorch op, the entire rollout is auto-differentiable.
The training loop is therefore:
- Forward: Image → ResNet → controls → bicycle rollout → predicted trajectory
-
Loss: Huber loss between predicted and ground-truth
$(x, y)$ positions - Backward: Gradients flow through the physics equations, teaching the network which controls produce trajectories that match reality
- Clipping: Gradient norms are clipped at 1.0 to prevent exploding gradients from the recurrent rollout
| Standard CNN (Phase 2) | Physics-Aware (Phase 3) |
|---|---|
![]() |
![]() |
| Jagged trajectory with lateral jitter | Smooth, kinematically feasible arcs |
| May "teleport" between lane centres | Continuous heading evolution |
-
Smoothness: The bicycle model enforces
$C^1$ continuity - heading changes are always proportional to speed and steering angle. -
Feasibility: Clamping acceleration to
$\pm4;\text{m/s}^2$ and steering to$\pm0.5;\text{rad}$ eliminates physically impossible predictions. - Training stability: The Huber loss combined with gradient clipping prevents the recurrent rollout from producing NaN gradients.
physics-aware-motion-prediction/
├── src/
│ ├── models/
│ │ ├── lstm_baseline.py # Phase 1 — LSTM encoder–decoder
│ │ ├── visual_cnn.py # Phase 2 — ResNet-18 regression head
│ │ ├── physics_aware.py # Phase 3 — ResNet-18 + bicycle rollout
│ │ └── multimodal.py # Multi-modal variant (K hypotheses + WTA loss)
│ ├── physics/
│ │ └── bicycle_model.py # Differentiable kinematic bicycle model
│ └── data/
│ └── rasterizer.py # HD-Map → 224×224 BEV rasteriser
│
├── scripts/
│ ├── train/ # Training entry-points per phase
│ └── evaluate/ # Evaluation, visualisation, failure analysis
│
├── train_model.py # Phase 1 training (LSTM)
├── train_visual.py # Phase 2 training (ResNet-18 CNN)
├── train_physics.py # Phase 3 training (Physics-Aware)
├── train_multimodal.py # Multi-modal training (Winner-Takes-All)
├── test_physics.py # Physics model evaluation + plotting
├── test_visual.py # Visual model evaluation + plotting
├── test_multimodal.py # Multi-modal evaluation + plotting
├── evaluate_model.py # ADE metric computation (LSTM)
├── find_failure.py # Worst-case failure analysis
├── visualize.py # Raw Waymo scene visualisation
│
├── bicycle.py # Original bicycle model (kept for compatibility)
├── rasterizer.py # Original rasterizer (kept for compatibility)
├── assets/ # Result images for documentation
├── pyproject.toml # Project metadata & dependencies
├── .gitignore
└── README.md
- Python ≥ 3.10
- CUDA-capable GPU (recommended; CPU training is possible but slow)
- Access to the Waymo Open Motion Dataset v1.3.1
This project is managed with uv.
# Clone the repository
git clone https://github.com/Ismail-Dagli/physics-aware-motion-prediction.git
cd physics-aware-motion-prediction
# Sync dependencies (creates .venv automatically)
uv syncThis project streams data directly from Google Cloud Storage. Create a files.txt listing the TFRecord shards:
gsutil ls gs://waymo_open_dataset_motion_v_1_3_1/uncompressed/scenario/training_20s/ > files.txtNote: You need an authenticated
gcloudsession with access to the Waymo Open Dataset bucket.
You can run scripts using uv run:
# Phase 1 — LSTM baseline
uv run scripts/train/train_model.py
# Phase 2 — Visual CNN
uv run scripts/train/train_visual.py
# Phase 3 — Physics-Aware (recommended)
uv run scripts/train/train_physics.pyTraining resumes automatically from the last checkpoint if a .pth file is found in checkpoints/.
# Quantitative ADE metric
uv run scripts/evaluate/evaluate_model.py
# Visual inspection of physics model
uv run scripts/evaluate/test_physics.py
# Find worst-case failure
uv run scripts/evaluate/find_failure.py| Package | Purpose |
|---|---|
| PyTorch | Neural network training, differentiable physics layer |
| torchvision | ResNet-18 pretrained backbone |
| TensorFlow | Reading Waymo TFRecord files via tf.data |
| OpenCV | Rasterisation — drawing lanes, edges, and agents |
| NumPy | Array operations and data preprocessing |
| Matplotlib | Result visualisation and plotting |
| Waymo Open Dataset | Protobuf definitions for scenario parsing |
This project is released under the MIT License.
The Waymo Open Motion Dataset is subject to the Waymo Dataset License Agreement.
- Waymo Open Dataset for providing the motion prediction benchmark and HD-map data.
- The kinematic bicycle model formulation follows Rajamani, R. Vehicle Dynamics and Control (Springer, 2011).
- ResNet architecture from He et al., Deep Residual Learning for Image Recognition (CVPR 2016).


