This is my thesis project from IT University of Copenhagen. The main GAIL(Generative Adversarial Imitation Learning) algorithm implementation is taken from Andrew Liao's gail-tf repository. Few changes were made to apply GAIL on Atari domain.
Implementation of GAIL, attached with few examples from Atari games: Boxing & MontezumaRevenge.
- Python 3.6
- Tensorflow 1.11.0
- Gym 0.10.9
- Atari-py 0.1.7
The main client train.py
contains all needed tasks and algorithms to run trainings:
- 'train_RL_expert' - train Reinforcement Learning agent using TRPO
- 'RL_expert' - sample trajectories from RL expert from previous task
- 'human_expert' - sample expert trajectories from a human game-play
- 'train_gail' - run GAIL training with one of the experts
- 'play_agent' - re-play the game using trained model
Algorithms can be chosen between TRPO and BC. Behavior cloning (BC) used mainly for pre-training of policy before GAIL training.
- Set Atari environment, for example:
args.env_id = 'MontezumaRevenge-ram-v0'
- Choose
args.task = 'train_RL_expert'
andargs.alg = 'trpo'
. - Make sure that model path is None if
you start new training
args.load_model_path = None
. - Run
train.py
. The model will be saved indata/training
directory. - If you wish to continue previous training, then specify path to model.
- Choose task
args.task = 'RL_expert'
and a path to model:args.load_model_path = 'data/training/trpo.Boxing-ram-v0.100.MD.1500/trpo.Boxing-ram-v0.100.MD.1500-0'
without file extension. - On line 47 specify a number of trajectories you need.
- Run
train.py
. It will save expert trajectories in.pkl
file in directorydata/expert
.
- Choose task
args.task = 'human_expert'
. - Run
train.py
. - Play game.
- If you want to add a trajectory while playing, press
Esc
or wait until game over, it will add a trajectory automatically. To finish collecting of trajectories, close the pygame window. To cancel all progress, stop execution of program.
- Set Atari environment, choose
args.task = 'train_gail'
andargs.alg = 'trpo'
. - Choose path to expert
args.expert_path
and optionallyargs.load_model_path
, if you want to continue with previous training. - To run GAIL with pre-trained weights, set
args.pretrained = True
- To pre-train model with Behavior cloning, choose
args.alg = 'bc'
instead oftrpo
. It will run BC and save BC model, without running GAIL.
Imitation of RL agents in Boxing-ram-v0, trained from 1 expert trajectory (left video
) and from 1500 trajectories (right video
):
Imitation of human player in MontezumaRevenge-ram-v0: