Skip to content

yarchickkk/apollo-dqn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apollo DQN: Building an RL Agent for LunarLander-v3

Poster

This repository contains a reproducible implementation of DQN, Double DQN and Dueling Double DQN for LunarLander-v3 (Gymnasium), using a Residual MLP backbone and a PER-lite replay strategy.

It supports the full workflow: you can train your own agents, re-evaluate my checkpoints or your own, generate all training diagnostics, and render episodes for any policy.

📰 Published in Towards AI

Read the full article here:

Apollo DQN: Building an RL Agent for LunarLander-v3

📜 Results

Model Avg WR@200 CI %Base %Human Avg WR@200 CI WR@100 CI %Base %Human
DQN (Base) 277.049 98.800 0.954 100.000 108.609 190.667 86.800 2.967 91.000 2.508 100.000 98.486
DDQN 277.251 97.400 1.395 100.073 108.651 184.980 83.000 3.293 90.600 2.558 97.017 96.725
DDQN Dueling 278.765 98.800 0.954 100.619 108.966 196.535 88.800 2.764 92.000 2.378 103.078 100.304
  • To reproduce the full post-training evaluation described in the article, run:
python eval.py

🕹 Demo

Demo

  • To render how a trained policy behaves, provide a checkpoint file (by default the best policy is used) and run:
python -m scripts.render_episode --weights weights/dqn.pt

🛠 Quickstart

  • To train a model, use your own config or start from one of the provided configurations and run:
python train.py --config configs/ddqn_dueling.yaml
  • To evaluate all checkpoints in a directory after training and print a summary table, run:
python eval.py --log_dir logs/ddqn/2026-02-06_20-10-16/weights/
  • To render diagnostic plots from saved training snapshots, use:
python -m scripts.plot_diagnostics --log_dir logs/dqn/2026-02-06_20-10-16/

This will generate:

  1. Loss, estimated Q-value, reward and episode length curves
  2. Action gap, absolute TD-error and TD-error P95 curves
  3. Update-to-data ratio curves
  4. Activation and pre-activation distributions with their gradients
  5. Weight gradient distributions

💿 Installation

Python 3.10–3.11 is recommended for best compatibility with Gymnasium, PyTorch and Box2D dependencies.

  • For a CPU-only installation, run:
pip install torch
pip install -r requirements.txt
  • For a CUDA-enabled installation, run:
pip install torch --index-url https://download.pytorch.org/whl/cu124  # e.g., CUDA 12.4
pip install -r requirements.txt
  • On Windows you may need MSVC build tools:
"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat"
  • If Triton is required, install wheel manually:

https://pypi.org/project/triton-windows/