Udacity Deep Reinforcement learning Nanodegree

Problem statement

Project Navigation: In this project, we have to train an agent to navigate (and collect bananas!) in a large, square world. This environment is provided by [Unity Machine Learning agents] (https://github.com/Unity-Technologies/ml-agents).

NOTE:

This project was completed in the Udacity Workspace, but the project can also be completed on a local Machine. Instructions on how to download and setup Unity ML environments can be found in Unity ML-Agents Github repo.
The environment provided by Udacity is similar to, but not identical to the Banana Collector environment on the Unity ML-Agents Github page.

Environment

The state space has 37 dimensions each of which is a continuous variable. It includes the agent's velocity, along with ray-based perception of objects around the agent's forward direction. The action space contains the following 4 legal actions:

move forward 0
move backward 1
turn left 2
turn right 3

A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana. Thus, the goal of the agent is to collect as many yellow bananas as possible while avoiding blue bananas. The task is episodic, and in order to solve the environment, your agent must get an average score of +13 over 100 consecutive episodes.

Getting started

Download the environment from one of the links below. You need to only select the environment that matches your operating sytem:

Linux: click here
Max OSX: click here
Windows (32-bit): click here
Windows (64-bit): click here

(For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.

(For AWS) If you'd like to train the agent on AWS (and have not enabled a virtual screen), the please use this link to obtain the environment.

Dependencies

Python 3.6
Pytorch
Unity ML-Agents

Running the code

After installing all dependcies, clone this repository in your local system.
Make sure you have Jupyter installed. To install Jupyter:

python3 -m pip install --upgrade pip
python3 -m pip install jupyter

The main code exists in Navigation.ipynb. This file contains two training methods, dqn for VanillaDQN and doubledqn for Double DQN. Call these methods to train the model from scratch. Or reload pretrained models provided in the ./checkpoints directory using load_state_dict method.

Solution

Two methods are implememted to solve this problem: VanillaDQN and DoubleDQN. Both of them use experience replay and target networks to improve training.
Soft target network update is used to in VanillaDQN but it didn't turn out to be useful in DoubleDQN.

Results

Vanilla DQN was able to solve the environment in approximately 1600 episodes. As is evident, after 500 episodes, the score did not improve much (it kept oscillating between 8 and 10). After 1200 episodes, the score finally starts improving.

Double DQN was able to solve the environment in about 700 episodes, which is a great improvement over VanillaDQN. Also in this case, the score consistently improves for all episodes of training.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.ipynb_checkpoints		.ipynb_checkpoints
checkpoints		checkpoints
plots		plots
DoubleDQN.py		DoubleDQN.py
Navigation.ipynb		Navigation.ipynb
README.md		README.md
Report.pdf		Report.pdf
VanillaDQN.py		VanillaDQN.py
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Udacity Deep Reinforcement learning Nanodegree

Problem statement

Environment

Getting started

Dependencies

Running the code

Solution

Results

About

Releases

Packages

Languages

thedatamonk/Unity-Banana-Navigation

Folders and files

Latest commit

History

Repository files navigation

Udacity Deep Reinforcement learning Nanodegree

Problem statement

Environment

Getting started

Dependencies

Running the code

Solution

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages