This repository includes implementations and performance reports of several bandit algorithms. The study covers:
2. Upper Confidence Bound (UCB)
5. Linear Thompson Sampling (LinTS)
6. Generalized Linear Model Bandit (GLM)
To simulate a specific algorithm, edit the Simulation.py
script by enabling the desired algorithm and disabling the others.
For example, to run the UCB algorithm with
## Initiate Bandit Algorithms ##
algorithms = {}
#algorithms['EpsilonGreedyLinearBandit'] = EpsilonGreedyLinearBandit(dimension=context_dimension, lambda_=0.1, epsilon=None)
#algorithms['EpsilonGreedyMultiArmedBandit'] = EpsilonGreedyMultiArmedBandit(num_arm=n_articles, epsilon=0.1)
#algorithms['ExplorethenCommit'] = ExplorethenCommit(num_arm=n_articles, m=30)
algorithms['UCBBandit'] = UCBBandit(num_arm=n_articles, alpha=0.5)
#algorithms['ThompsonSamplingGaussianMAB'] = ThompsonSamplingGaussianMAB(num_arm=n_articles)
#algorithms['LinearUCBBandit'] = LinearUCBBandit(dimension=context_dimension, lambda_=0.1, alpha=0.5) #delta=0.05, alpha=2.358
#algorithms['LinearThompsonSamplingMAB'] = LinearThompsonSamplingMAB(dimension=context_dimension, lambda_=0.1)
After selecting your algorithm, run the Simulation.py
script.
Hyperparameter (m) | Cumulative Regret |
---|---|
10 | 1001.40 |
20 | 214.90 |
30 | 334.02 |
(a) m = 10 (b) m = 20 (c) m = 30
Figure 1: Explore then Commit accumulated regret
Hyperparameter (α) | Cumulative Regret |
---|---|
0.1 | 256.50 |
0.5 | 977.03 |
1.0 | 1906.65 |
(a)
Figure 2: UCB Bandit accumulated regret
Hyperparameter (α) | Cumulative Regret |
---|---|
0.5 | 24.43 |
1.5 | 177.89 |
2.5 | 487.73 |
(a)
Figure 4: Linear UCB accumulated regret
(a)
Figure 5: Linear UCB estimation error
Cumulative Regret |
---|
1098.24 |
Figure 6: Linear Thompson Sampling accumulated regret and estimation error