Degree: COMS
Description: Recent work in compositional reinforcement learning has demonstrated how to combine skills to solve tasks specified using Boolean algebra operators. However, the algorithm to do so uses standard Q-learning with epsilon greedy exploration. One aspect of the algorithm is the way the agent decides on which goal to explore, which is currently done in a greedy fashion. In this project, we propose extending this algorithm to incorporate different ways of goal selection, such as through uniform random or bandit-based strategies. This project also involves the creation of a virtual environment in Unity or mujoco-worldgen.
Tags/topics: Reinforcement learning, deep reinforcement learning, game design
Algorithms:
- Explore only
- Exploit only
- ε-greedy (Epsilon greedy)
- UCB (Upper Confidence Bound)
- EXP4
- Softmax
- Optimistic initialization
- Intrinsic rewards
- Q-map
References:
- Benureau, Fabien, and Pierre-Yves Oudeyer. "Diversity-driven selection of exploration strategies in multi-armed bandits." In 2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), pp. 135-142. IEEE, 2015.
- Tasse, Geraud Nangue, Steven James, and Benjamin Rosman. A Boolean Task Algebra for Reinforcement Learning. Neurips 2020.
- Pardo, Fabio, Vitaly Levdik, and Petar Kormushev. "Q-map: a convolutional approach for goal-oriented reinforcement learning." (2018).
- H. Shi, Z. Lin, K. Hwang, S. Yang and J. Chen, "An Adaptive Strategy Selection Method With Reinforcement Learning for Robotic Soccer Games," in IEEE Access, vol. 6, pp. 8376-8386, 2018
- Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018.
- Zhang, Taidong, Xianze Li, Xudong Li, Guanghui Liu, and Miao Tian. "Reinforcement Learning based Strategy Selection in StarCraft: Brood War." In Proceedings of the 2020 Artificial Intelligence and Complex Systems Conference, pp. 121-128. 2020.