ReinforcementLearning

Chapter wise implementation & analysis of all the algorithms in RL : An Intoduction by Richard S. Sutton and Andrew G. Barto

Chapter 2

The Notebook Greedy,e-Greedy,UCB,Gradient.ipynb demonstrates the working of following algorithms:

Greedy Algorithm
epsilon-Greedy Algorithm
UCB
Gradient Bandit

The notebook also shows the anlysis on the above algorithms with Optimistic Initial Values. Results shows that UCB outperforms all other algorithms in stationary K-armed Bandit problem.

Chapter 4

The notebook RL using Dynamic Programming.ipynb demonstrates the way of solving finite MDPs. Below mentioned alorithms are implmented:

Policy Iteration with two arrays
Policy Iteration using inplace update
Value Iteration with two arrays
Value Iteration using inplace updates

The results clearly shows that the Value Iteration with inplace updates converges faster then the other three algorithms.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Dyna-Q, Dyna-Q+ Learning.ipynb		Dyna-Q, Dyna-Q+ Learning.ipynb
Greedy,e-Greedy,UCB,Gradient.ipynb		Greedy,e-Greedy,UCB,Gradient.ipynb
LICENSE		LICENSE
Monte Carlo.ipynb		Monte Carlo.ipynb
README.md		README.md
RL using Dynamic Programming.ipynb		RL using Dynamic Programming.ipynb
TD Learning.ipynb		TD Learning.ipynb
_config.yml		_config.yml
n-step TD Learning.ipynb		n-step TD Learning.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReinforcementLearning

Chapter 2

Chapter 4

About

Releases

Packages

Languages

License

SanketAgrawal/ReinforcementLearning

Folders and files

Latest commit

History

Repository files navigation

ReinforcementLearning

Chapter 2

Chapter 4

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages