Multi Armed Bandits (MAB).

A study consists of a collection of bandits, $B = \{b_1, ..., b_n\}$, where a bandit, $b_i$ has an associated parameter $\theta_i$ (or two parameters) that are within the exponential family of distributions. Thus, we have a collection $\theta = \{\theta_1, ..., \theta_n \}$ of parameters (for the sake of two parameter distribution (e.g Gaussian), we refer to the primary parameter ($\mu$) as this single parameter).

When 'pulling' a bandit, we are generating a random variable from this bandit's distribution. For example, if generating a random variable from bandit $i$ that follows a Poisson distribution, we denote this $X_i \thicksim Pois(\theta_i)$. 'Payout' or 'reward' is what we call this generated random variable, $x_i$, and is what we wish to maximize. In the case of gambling machines, it could be a 'win' (Bernoulli), or some form of 'monetary payout' (Gaussian).

We therefore wish to find the optimal bandit to exploit; or in other words the bandit with the highest parameter: $\tilde{\theta} = max(\{\theta_1, ..., \theta_n\})$.

There is a second constraint however, that is doing this using the fewest generations (bandit pulls) as possible. Reasons for this include being able to maximize cumulative reward, given a fixed number of generations (e.g if there is a time constraint), or generating may have another associated cost (e.g paying for each armed bandit pull).

The problem can therefore be encapsulated as: Given a fixed number of generations, $g \in \{1, ..., G\}$, maximizing cumulative reward: $\sum{x_g}$. This can be done by a number of different strategies - and all include some relation to balancing exploration (trying to accurately estimate each parameter $\{\hat{\theta}_1, ..., \hat{\theta}_n \}$), with exploration (the cumulative reward gained by just generating from the optimal bandit $\tilde{\theta}$).

Usage

Pre-requisites

To run the below commands, it's suggested that Just is installed.

Running Simulations

Options exist for simulating both directly using the command line, or via passing a configuration file in the form of JSON. The second is recommended for reproducable simulation studies.

To see all possible simulation commands run

just help

or help for a specific command:

just {{COMMAND}} help

e.g:

just list-distributions help

To view possible distributions, and information about their associated parameters, run:

just list-distributions

Simulating directly

To run a simulation by passing arguments directly, run:

just {{COMMAND}} {{*ARGS}}

Where COMMAND is a command returned from just help, and args are those required by a specific command upon running just {{COMMAND}} help.

Simulating from config (json) file

There is the option to perform a MAB simulation, reading arguments and configuration from a config file. To perform this simulation, run:

just simulate-from-json {{COMMAND}} {{CONFIG}}

Testing

To run all unit and integration tests in the repository, execute:

just local-test

Sample Outputs

Metrics from the simulations are output to stdout.

Plots are also optionally produced, detailing the the performance of the simulation.

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
config		config
img		img
src		src
.gitignore		.gitignore
README.md		README.md
justfile		justfile
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi Armed Bandits (MAB).

Usage

Pre-requisites

Running Simulations

Simulating directly

Simulating from config (json) file

Testing

Sample Outputs

About

Releases

Packages

Languages

FlynnOwen/multi-armed-bandits

Folders and files

Latest commit

History

Repository files navigation

Multi Armed Bandits (MAB).

Usage

Pre-requisites

Running Simulations

Simulating directly

Simulating from config (json) file

Testing

Sample Outputs

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages