Skip to content

Explore the 10-Arm Testbed Simulation! 🎲 Utilize Python to test various Ξ΅-greedy strategies in a reinforcement learning environment. Visualize and compare agents' performance as they balance exploration and exploitation. Perfect for learners and enthusiasts! πŸš€πŸ“Š

License

Notifications You must be signed in to change notification settings

KaranAnchan/10_Arm_Testbed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

45 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


10-Arm Testbed Simulation 🎰

Overview πŸ“–

This project implements a simulation of the 10-arm testbed problem commonly used in reinforcement learning to demonstrate the Ξ΅-greedy algorithm. Different Ξ΅-values are tested to observe their impact on the agent's ability to balance exploration and exploitation.

Files in the Repository πŸ—‚οΈ

  • main.py: The main script to run simulations. It sets up the environment, initializes agents with different Ξ΅-values, and runs the simulations.
  • agent.py: Defines the Agent class, which encapsulates the behavior of an Ξ΅-greedy agent.
  • visualization.py: Contains functions to visualize the results of the simulations using Seaborn and Matplotlib for better aesthetic appeal.

Setup & Installation πŸ› οΈ

Before running the simulation, make sure you have Python installed on your system. You will also need the following Python packages:

  • NumPy
  • Matplotlib
  • Seaborn

You can install these packages using pip:

pip install numpy matplotlib seaborn

Running the Simulation πŸš€

To run the simulation, execute the main.py file. This can be done from the command line:

python main.py

Visualizations πŸ“Š

Average Reward vs. Episodes

This plot shows the average reward over episodes for different agents.

Average Reward vs. Episodes

Selections of Each Arm

This grouped bar chart visualizes the number of times each arm was selected by different agents.

Selections of Each Arm

Comparison between Optimistic and UCB Agents

This plot compares the average reward over episodes for the optimistic initial values agent and the UCB agent.

Optimistic vs. UCB

Inferences from Visualizations πŸ“ˆ

  1. Average Reward vs. Episodes:

    • The UCB agent consistently achieves a higher average reward compared to Ξ΅-greedy agents.
    • The optimistic initial values agent starts strong but converges to similar performance as the Ξ΅ = 0.1 agent.
  2. Selections of Each Arm:

    • The UCB agent explores the arms more uniformly compared to other agents.
    • The Ξ΅ = 0.01 agent tends to exploit more, showing a preference for a particular arm.
  3. Comparison between Optimistic and UCB Agents:

    • The UCB agent outperforms the optimistic initial values agent in terms of average reward.
    • The optimistic agent starts with a higher initial reward but is eventually surpassed by the UCB agent.

Contributing 🀝

Feel free to fork this project. Enjoy exploring reinforcement learning with this 10-arm testbed simulation! 🌟

License πŸ“„

This project is open-source and available under the MIT License.


About

Explore the 10-Arm Testbed Simulation! 🎲 Utilize Python to test various Ξ΅-greedy strategies in a reinforcement learning environment. Visualize and compare agents' performance as they balance exploration and exploitation. Perfect for learners and enthusiasts! πŸš€πŸ“Š

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages