This coursework was completed as part of the Reinforcement Learning course with applications in maintenance optimization, lectured jointly by Professor Abdollah Safari and Professor Firoozeh Haghighi in Fall 2025 at the Department of Mathematics, Statistics, and Computer Science, University of Tehran.
-
Value Iteration in GridWorld
Implemented value iteration to find the optimal policy in a GridWorld environment. -
Monte Carlo Algorithms
Introduced model-free methods by implementing First-Visit Monte Carlo and Every-Visit Monte Carlo. -
Policy Evaluation for Blackjack
Applied Monte Carlo policy evaluation to estimate state values in a Blackjack environment. -
SARSA & Q-Learning
Implemented SARSA and SARSA(Ξ»), tuned Ξ» for optimal performance, and compared results with Q-learning. -
Custom GridWorld with Gymnasium
Built a GridWorld environment from scratch using the Gymnasium library. -
Value Function Approximation
Applied linear function approximation, radial basis functions (RBF), and neural networks for value function approximation in the MountainCar environment (Gym).
We applied reinforcement learning to optimize maintenance policies for a two-component series system subject to gamma degradation and common shocks.
-
Actions considered:
- Preventive Maintenance (PM)
- Opportunistic Maintenance (OM)
- Corrective Maintenance (CM)
-
Algorithm: Q-learning
-
Contribution: Derived an optimal maintenance policy balancing reliability and cost.
-
Results: Presented at the 11th Seminar on Reliability Theory and Its Applications.
π See the Project/ folder for code and results.
The full paper is available in the Project/ folder:
- Title: A Q-learning Approach to Maintenance Policy Optimization for Series Systems with Degradation and Common Shocks
- Conference: 11th Seminar on Reliability Theory and Its Applications, Department of Mathematics and Statistics, University of Isfahan
- Python 3.10+
- Gymnasium
- NumPy, Matplotlib, PyTorch (for assignments and project)