rl_lib

Motivation: I have always thought that the only way to truely test if you understand a concept is to see if you can build it. As such all these these algorithms are implemented studying the relevant papers and coded to test my understanding.

What I cannot create, I do not understand” - Richard Feynman

Algorithms

DQN

Vanilla DQN
- Paper: Human-level control through deep reinforcement learning
Noisy DQN
- Paper: Noisy Networks for Exploration
Dualing DQN
- Paper: Dueling Network Architectures for Deep Reinforcement Learning
Double DQN
- Paper: Deep Reinforcement Learning with Double Q-learning
Prioritiesed Experience Replay DQN
- Paper: Prioritized Experience Replay
Rainbow DQN
- Paper: Rainbow: Combining Improvements in Deep Reinforcement Learning

Policy Gradient

Advantage Actor Critic (A2C) - single environment
- Paper: Asynchronous Methods for Deep Reinforcement Learning
Advantage Actor Critic (A2C) - multi environment
- Paper: Asynchronous Methods for Deep Reinforcement Learning
Deep Deterministic Policy Gradients
- Paper: Continuous Control with Deep Reinforcement Learning
Proximal Policy Optimisation (discrete and continuous)
- Paper: Proximal Policy Optimization Algorithms

Tabular Solutions

These were mainly referenced from a really good lecture series by Colin Skow on youtube [link]. A large part was also found in the Deep Reinforcement Learning Udacity course.

Bellman Equation
Dynamic Programming
Q learning

Results

DQN Pong

Converged to an average of 17.56 after 1300 Episodes.
Code and results can be found under DQN/7. Vanilla DQN Atari.ipynb

PPO discrete

Solved in 409 episodes
Code and results can be found under Policy Gradient/5. PPO.ipynb

DDPG Continuous

Converged to ~ -270 after a 100 episodes
Code and results can be found under Policy Gradient/4. DDPG.ipynb.ipynb

Todo

General Advantage Estimator
Pull Policy Gradient algorithms into seperate files
Curiousity Driven Exploration
HER (Hindsight Experience Replay)
Recurrent networks in PPO and DDPG

Credits

Whilst I tried to code everything directly from the papers, it wasn't always easy to understand what I was doing wrong when the algorithm just wouldn't train or I got runtime errors. As such I used the following repositories as references.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
DQN		DQN
Policy Gradient		Policy Gradient
Tabular Solution		Tabular Solution
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rl_lib

Algorithms

DQN

Policy Gradient

Tabular Solutions

Results

DQN Pong

PPO discrete

DDPG Continuous

Todo

Credits

About

Releases

Packages

Languages

Instance-contrib/rl_lib

Folders and files

Latest commit

History

Repository files navigation

rl_lib

Algorithms

DQN

Policy Gradient

Tabular Solutions

Results

DQN Pong

PPO discrete

DDPG Continuous

Todo

Credits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages