Chapter wise implementation & analysis of all the algorithms in RL : An Intoduction by Richard S. Sutton and Andrew G. Barto
The Notebook Greedy,e-Greedy,UCB,Gradient.ipynb demonstrates the working of following algorithms:
- Greedy Algorithm
- epsilon-Greedy Algorithm
- UCB
- Gradient Bandit
The notebook also shows the anlysis on the above algorithms with Optimistic Initial Values. Results shows that UCB outperforms all other algorithms in stationary K-armed Bandit problem.
The notebook RL using Dynamic Programming.ipynb demonstrates the way of solving finite MDPs. Below mentioned alorithms are implmented:
- Policy Iteration with two arrays
- Policy Iteration using inplace update
- Value Iteration with two arrays
- Value Iteration using inplace updates
The results clearly shows that the Value Iteration with inplace updates converges faster then the other three algorithms.