Project Navigation: In this project, we have to train an agent to navigate (and collect bananas!) in a large, square world. This environment is provided by [Unity Machine Learning agents] (https://github.com/Unity-Technologies/ml-agents).
NOTE:
- This project was completed in the Udacity Workspace, but the project can also be completed on a local Machine. Instructions on how to download and setup Unity ML environments can be found in Unity ML-Agents Github repo.
- The environment provided by Udacity is similar to, but not identical to the Banana Collector environment on the Unity ML-Agents Github page.
The state space has 37
dimensions each of which is a continuous variable. It includes the agent's velocity, along with ray-based perception of objects around the agent's forward direction.
The action space contains the following 4
legal actions:
- move forward
0
- move backward
1
- turn left
2
- turn right
3
A reward of +1
is provided for collecting a yellow banana, and a reward of -1
is provided for collecting a blue banana. Thus, the goal of the agent is to collect as many yellow bananas as possible while avoiding blue bananas.
The task is episodic, and in order to solve the environment, your agent must get an average score of +13
over 100
consecutive episodes.
- Download the environment from one of the links below. You need to only select the environment that matches your operating sytem:
- Linux: click here
- Max OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
(For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.
(For AWS) If you'd like to train the agent on AWS (and have not enabled a virtual screen), the please use this link to obtain the environment.
- Python 3.6
- Pytorch
- Unity ML-Agents
- After installing all dependcies, clone this repository in your local system.
- Make sure you have Jupyter installed. To install Jupyter:
python3 -m pip install --upgrade pip
python3 -m pip install jupyter
- The main code exists in
Navigation.ipynb
. This file contains two training methods,dqn
for VanillaDQN anddoubledqn
for Double DQN. Call these methods to train the model from scratch. Or reload pretrained models provided in the./checkpoints
directory usingload_state_dict
method.
- Two methods are implememted to solve this problem: VanillaDQN and DoubleDQN. Both of them use experience replay and target networks to improve training.
- Soft target network update is used to in VanillaDQN but it didn't turn out to be useful in DoubleDQN.
Vanilla DQN was able to solve the environment in approximately 1600 episodes. As is evident, after 500 episodes, the score did not improve much (it kept oscillating between 8 and 10). After 1200 episodes, the score finally starts improving.
Double DQN was able to solve the environment in about 700 episodes, which is a great improvement over VanillaDQN. Also in this case, the score consistently improves for all episodes of training.