Mortal-Policy

This repository is a branch of Mortal original repository ,transitioning from value-based methods to policy-based methods.

Overview

Initially developed in 2022 based on Mortal V2, migrated to Mortal V4 in 2024.
This branch features:

More stable performance optimization process
Enhanced final performance

Note:
The performance results are based on a comparison with the baseline model. The baseline used for testing has been uploaded to RiichLab(mjai.app) and has maintained a stable rank across multiple evaluation batches.

Installation

Consistent with the original repository. Read the Documentation
Torch requirement: torch2.5.1+cu124 (install via pip)

Run

Mortal-Policy adopts an offline to online training approach:

Data Preparation
Collect samples in mjai format.
Configuration
Rename config.example.toml to config.toml and set hyperparameters.
Training Stages
- Offline Phase1 (Advantage Weighted Regression):
  Run train_offline_phase1.py
- Offline Phase2 (Behavior Proximal Policy Optimization):
  It is optional and the code is coming soon
- Online Phase (Policy Gradient with Importance Sampling and PPO-style Clipping):
  Run train_online.py
While online-only training is possible, it is not recommended.

Weights & Configuration

Maintained alignment with original Mortal repository. For details see this post.
The weights, hyperparameters, and some online training features have been removed from this branch when it was open-sourced.

License

Code

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Mortal-Policy

Overview

Installation

Run

Weights & Configuration

License

Code

Assets

Files

README.md

Latest commit

History

README.md

File metadata and controls

Mortal-Policy

Overview

Installation

Run

Weights & Configuration

License

Code

Assets