Publication: Kim, Y.J., Chi, M. Time-aware deep reinforcement learning with multi-temporal abstraction. Applied Intelligence (2023). https://doi.org/10.1007/s10489-022-04392-5
Abstract Deep reinforcement learning (DRL) is advantageous, but it rarely performs well when tested on real-world decision-making tasks, particularly those involving irregular time series with sparse actions. Although irregular time series with sparse actions can be handled using temporal abstractions for the agent to grasp high-level states, they aggravate temporal irregularities by increasing the range of time intervals essential to represent a state and estimate expected returns. In this work, we propose a general Time-aware DRL framework with Multi-Temporal Abstraction (T-MTA) that incorporates the awareness of time intervals from two aspects: temporal discounting and temporal abstraction. For the former, we propose a Time-aware DRL method, whereas for the latter we propose a Multi-Temporal Abstraction mechanism. T-MTA was tested in three standard RL testbeds and two real-life tasks (control of nuclear reactors and prevention of septic shock), which represent four common contexts of learning environments, online and offline, as well as fully and partially observable. As T-MTA is a general framework, it can be combined with any model-free DRL method. In this work, we examined two in particular: the Deep Q-Network approach and its variants, and Truly Proximal Policy Optimization. Our results show that T-MTA significantly outperforms competing baseline frameworks, including a standalone Time-aware DRL framework, MTAs, and the original DRL methods without considering either type of temporal aspect, especially when partially observable environments are involved and the range of time intervals is large.