-
Notifications
You must be signed in to change notification settings - Fork 17
/
test.py
165 lines (132 loc) · 5.79 KB
/
test.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
"""
Test DQN Model for Unity ML-Agents Environments using PyTorch
This example tests a trained DQN NN model on a modified version of the Unity ML-Agents Banana Collection Example Environment.
The environment includes a single agent, who can turn left or right and move forward or backward.
The agent's task is to collect yellow bananas (reward of +1) that are scattered around an square
game area, while avoiding purple bananas (reward of -1). For the version of Bananas employed here,
the environment is considered solved when the average score over the last 100 episodes > 13.
Example Developed By:
Michael Richardson, 2018
Project for Udacity Danaodgree in Deep Reinforcement Learning (DRL)
Code Expanded and Adapted from Code provided by Udacity DRL Team, 2018.
"""
###################################
# Import Required Packages
import torch
import time
import random
import numpy as np
from dqn_agent import Agent
from unityagents import UnityEnvironment
"""
###################################
STEP 1: Set the Test Parameters
======
num_episodes (int): number of test episodes
"""
num_episodes=10
"""
###################################
STEP 2: Start the Unity Environment
# Use the corresponding call depending on your operating system
"""
env = UnityEnvironment(file_name="Banana.app")
# - **Mac**: "Banana.app"
# - **Windows** (x86): "Banana_Windows_x86/Banana.exe"
# - **Windows** (x86_64): "Banana_Windows_x86_64/Banana.exe"
# - **Linux** (x86): "Banana_Linux/Banana.x86"
# - **Linux** (x86_64): "Banana_Linux/Banana.x86_64"
# - **Linux** (x86, headless): "Banana_Linux_NoVis/Banana.x86"
# - **Linux** (x86_64, headless): "Banana_Linux_NoVis/Banana.x86_64"
"""
#######################################
STEP 3: Get The Unity Environment Brian
Unity ML-Agent applications or Environments contain "BRAINS" which are responsible for deciding
the actions an agent or set of agents should take given a current set of environment (state)
observations. The Banana environment has a single Brian, thus, we just need to access the first brain
available (i.e., the default brain). We then set the default brain as the brain that will be controlled.
"""
# Get the default brain
brain_name = env.brain_names[0]
# Assign the default brain as the brain to be controlled
brain = env.brains[brain_name]
"""
#############################################
STEP 4: Determine the size of the Action and State Spaces
#
# The simulation contains a single agent that navigates a large environment.
# At each time step, it can perform four possible actions:
# - `0` - walk forward
# - `1` - walk backward
# - `2` - turn left
# - `3` - turn right
#
# The state space has `37` dimensions and contains the agent's velocity,
# along with ray-based perception of objects around agent's forward direction.
# A reward of `+1` is provided for collecting a yellow banana, and a reward of
# `-1` is provided for collecting a purple banana.
"""
# Set the number of actions or action size
action_size = brain.vector_action_space_size
# Set the size of state observations or state size
state_size = brain.vector_observation_space_size
"""
###################################
STEP 5: Initialize a DQN Agent from the Agent Class in dqn_agent.py
A DQN agent initialized with the following state, action and DQN hyperparameters.
DQN Agent Parameters
======
state_size (int): dimension of each state (required)
action_size (int): dimension of each action (required)
The DQN agent specifies a local and target neural network for training.
The network is defined in model.py. The input is a real (float) value vector of observations.
(NOTE: not appropriate for pixel data). It is a dense, fully connected neural network,
with 2 x 128 node hidden layers. The network can be modified by changing model.py.
Here we initialize an agent using the Unity environments state and action size determined above
We also load the model parameters from training
"""
#Initialize Agent
agent = Agent(state_size=state_size, action_size=action_size, seed=0)
# Load trained model weights
agent.network.load_state_dict(torch.load('dqnAgent_Trained_Model.pth'))
"""
###################################
STEP 6: Play Banana for specified number of Episodes
"""
# loop from num_episodes
for i_episode in range(1, num_episodes+1):
# reset the unity environment at the beginning of each episode
# set train mode to false
env_info = env.reset(train_mode=False)[brain_name]
# get initial state of the unity environment
state = env_info.vector_observations[0]
# set the initial episode score to zero.
score = 0
# Run the episode loop;
# At each loop step take an action as a function of the current state observations
# If environment episode is done, exit loop...
# Otherwise repeat until done == true
while True:
# determine epsilon-greedy action from current sate
action = agent.act(state)
# send the action to the environment and receive resultant environment information
env_info = env.step(action)[brain_name]
next_state = env_info.vector_observations[0] # get the next state
reward = env_info.rewards[0] # get the reward
done = env_info.local_done[0] # see if episode has finished
# set new state to current state for determining next action
state = next_state
# Update episode score
score += reward
# If unity indicates that episode is done,
# then exit episode loop, to begin new episode
if done:
break
# (Over-) Print current average score
print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, score), end="")
"""
###################################
STEP 7: Everything is Finished -> Close the Environment.
"""
env.close()
# END :) #############