Skip to content

Commit

Permalink
Merge pull request #465 from instadeepai/develop
Browse files Browse the repository at this point in the history
Feature / Release 0.1.2
  • Loading branch information
KaleabTessera authored Mar 25, 2022
2 parents 424025b + 6161758 commit 25d87f0
Show file tree
Hide file tree
Showing 34 changed files with 1,360 additions and 171 deletions.
8 changes: 4 additions & 4 deletions .github/ISSUE_TEMPLATE/bugfix_internal.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ A clear and concise description of what the bug is.

### To Reproduce
Steps to reproduce the behavior:
1.
2.
3.
4.
1.
2.
3.
4.

### Expected behavior
A clear and concise description of what you expected to happen.
Expand Down
29 changes: 29 additions & 0 deletions .github/ISSUE_TEMPLATE/investigation_internal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
name: Investigation
about: Outline the structure for an investigation. This would commonly be used to measure the impact of various design/implementation choices.
title: '[INVESTIGATION]'
labels: investigation
assignees: ''

---

### What do you want to investigate?
A brief description of what you would like to investigate. Do you have a hypothesis?

### Definition of done
A precise outline for the investigation to be considered complete.

### [***Optional***] Results
<!-- This is added after the investigation is complete. -->
Results from experiments/derivations. This could be linked to a benchmarking issue.

### What was the conclusion of your investigation?
<!-- This is added after the investigation is complete. -->
- What are the findings from the investigation?
- Was your hypothesis correct?

### [***Optional***] Discussion/Future Investigations
<!-- This is added after the investigation is complete. -->
This could be a link to a Github [discussions page](https://github.com/instadeepai/Mava/discussions).

<!-- Base checklist. Don’t hesitate to adapt it to your use-case. -->
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ repos:
additional_dependencies: [flake8-isort]

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.910
rev: v0.941
hooks:
- id: mypy
exclude: ^docs/
Expand Down
52 changes: 37 additions & 15 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ We include a number of systems running on continuous control tasks.
- **MADDPG**:
a MADDPG system running on the continuous action space simple_spread MPE environment.
- *Feedforward*:
- decentralised
- Decentralised
- [decentralised][debug_maddpg_ff_dec]
- [decentralised record agents][debug_maddpg_ff_dec_record] (***recording agents acting in the environment***)
- [decentralised executor scaling][debug_maddpg_ff_dec_scaling_executors] (***scaling to 4 executors***)
Expand All @@ -20,7 +20,7 @@ We include a number of systems running on continuous control tasks.
- [decentralised lr scheduling][debug_maddpg_ff_dec_lr_scheduling](***using lr schedule***)
- [decentralised evaluator intervals][debug_maddpg_ff_dec_eval_intervals](***running the evaluation loop at intervals***)

- [centralised][debug_maddpg_cen] , [networked][debug_maddpg_networked] (***using a fully-connected, networked architecture***), [networked with custom architecture][debug_maddpg_networked_custom] (***using a custom, sparse, networked architecture***) and [state_based][debug_maddpg_state_based].
- [centralised][debug_maddpg_cen] , [networked][debug_maddpg_networked] (***using a fully-connected, networked architecture***), [networked with custom architecture][debug_maddpg_networked_custom] (***using a custom, sparse, networked architecture***) and [state_based][debug_maddpg_state_based].

- *Recurrent*
- [decentralised][debug_maddpg_rec_dec] and [state_based][debug_maddpg_state_based].
Expand All @@ -45,17 +45,19 @@ We include a number of systems running on continuous control tasks.
- **MAD4PG**:
a MAD4PG system running on the Multiwalker environment.
- *Feedforward*
- [decentralised][pz_mad4pg_ff_dec] and [decentralised record agents][pz_mad4pg_ff_dec_record] (***recording agents acting in the environment***).
- [decentralised][pz_mad4pg_ff_dec]
- [decentralised record agents][pz_mad4pg_ff_dec_record] (***recording agents acting in the environment***).

- **MAPPO**
- *Feedforward*
- **MAPPO**
- *Feedforward*
- [decentralised][pz_mappo_ff_dec].

### 2D RoboCup

- **MAD4PG**:
a MAD4PG system running on the RoboCup environment.
- *Recurrent* [state_based][robocup_mad4pg_ff_state_based].
- *Recurrent*
- [state_based][robocup_mad4pg_ff_state_based].

## Discrete control

Expand All @@ -71,29 +73,42 @@ We also include a number of systems running on discrete action space environment
- **MADQN**:
a MADQN system running on the discrete action space simple_spread MPE environment.
- *Feedforward*
- [decentralised][debug_madqn_ff_dec], [decentralised lr scheduling][debug_madqn_ff_dec_lr_schedule] (***using lr schedule***), [decentralised custom lr scheduling][debug_madqn_ff_dec_custom_lr_schedule] (***using custom lr schedule***) and [decentralised custom epsilon decay scheduling][debug_madqn_ff_dec_custom_eps_schedule] (***using configurable epsilon scheduling***).
- Decentralised
- [decentralised][debug_madqn_ff_dec]
- [decentralised lr scheduling][debug_madqn_ff_dec_lr_schedule] (***using lr schedule***)
- [decentralised custom lr scheduling][debug_madqn_ff_dec_custom_lr_schedule] (***using custom lr schedule***)
- [decentralised custom epsilon decay scheduling][debug_madqn_ff_dec_custom_eps_schedule] (***using configurable epsilon scheduling***).
- *Recurrent*
- [decentralised][debug_madqn_rec_dec].

- **VDN**:
a VDN system running on the discrete action space simple_spread MPE environment.
- *Recurrent* [centralised][debug_vdn_rec_cen].
- *Recurrent*
- [centralised][debug_vdn_rec_cen].

### PettingZoo - Multi-Agent Atari

- **MADQN**:
a MADQN system running on the two-player competitive Atari Pong environment.
- *Recurrent* [decentralised][pz_madqn_pong_ff_dec].
- *Recurrent*
- [decentralised][pz_madqn_pong_rec_dec].

- **MAPPO**:
a MAPPO system running on two-player cooperative Atari Pong.
- *feedforward*
- [decentralised][pz_mappo_coop_pong_ff_dec].

### PettingZoo - Multi-Agent Particle Environment

- **MADDPG**:
a MADDPG system running on the Simple Speaker Listener environment.
- *Feedforward* [decentralised][pz_maddpg_mpe_ssl_ff_dec].
- *Feedforward*
- [decentralised][pz_maddpg_mpe_ssl_ff_dec].

- **MADDPG**:
a MADDPG system running on the Simple Spread environment.
- *Feedforward* [decentralised][pz_maddpg_mpe_ss_ff_dec].
- *Feedforward*
- [decentralised][pz_maddpg_mpe_ss_ff_dec].

### SMAC - StarCraft Multi-Agent Challenge

Expand All @@ -106,17 +121,20 @@ We also include a number of systems running on discrete action space environment

- **QMIX**:
a QMIX system running on the SMAC environment.
- *Recurrent* [centralised][smac_qmix_rec_cen].
- *Recurrent*
- [centralised][smac_qmix_rec_cen].

- **VDN**:
a VDN system running on the SMAC environment.
- *Recurrent* [centralised][smac_vdn_rec_cen].
- *Recurrent*
- [centralised][smac_vdn_rec_cen].

### OpenSpiel - Tic Tac Toe

- **MADQN**:
a MADQN system running on the OpenSpiel environment.
- *Feedforward* [decentralised][openspiel_madqn_ff_dec].
- *Feedforward*
- [decentralised][openspiel_madqn_ff_dec].

<!-- Examples -->
[quickstart]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/quickstart.ipynb
Expand Down Expand Up @@ -151,6 +169,7 @@ We also include a number of systems running on discrete action space environment
[pz_mad4pg_ff_dec_record]: https://github.com/instadeepai/Mava/blob/develop/examples/petting_zoo/sisl/multiwalker/feedforward/decentralised/run_mad4pg_record.py
[pz_mappo_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/petting_zoo/sisl/multiwalker/feedforward/decentralised/run_mappo.py

[robocup_mad4pg_ff_state_based]:https://github.com/instadeepai/Mava/blob/develop/examples/tf/robocup/recurrent/state_based/run_mad4pg.py
<!-- Discrete -->
[debug_mappo_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/debugging/simple_spread/feedforward/decentralised/run_mappo.py
[debug_mappo_ff_cen]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/debugging/simple_spread/feedforward/centralised/run_mappo.py
Expand All @@ -163,12 +182,15 @@ We also include a number of systems running on discrete action space environment

[debug_vdn_rec_cen]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/debugging/simple_spread/recurrent/centralised/run_vdn.py

[pz_madqn_pong_rec_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/petting_zoo/atari/pong/recurrent/centralised/run_madqn.py
[pz_madqn_pong_rec_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/petting_zoo/atari/pong/recurrent/decentralised/run_madqn.py

[pz_mappo_coop_pong_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/petting_zoo/butterfly/cooperative_pong/feedforward/decentralised/run_mappo.py

[pz_maddpg_mpe_ssl_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/petting_zoo/mpe/simple_speaker_listener/feedforward/decentralised/run_maddpg.py

[pz_maddpg_mpe_ss_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/petting_zoo/mpe/simple_spread/feedforward/decentralised/run_maddpg.py

[smac_madqn_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/smac/feedforward/decentralised/run_madqn.py
[smac_madqn_rec_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/smac/recurrent/decentralised/run_madqn.py

[smac_qmix_rec_cen]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/smac/recurrent/centralised/run_qmix.py
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,8 @@ def main(_: Any) -> None:
network_factory=network_factory,
logger_factory=logger_factory,
num_executors=1,
optimizer=snt.optimizers.Adam(learning_rate=5e-4),
policy_optimizer=snt.optimizers.Adam(learning_rate=5e-4),
critic_optimizer=snt.optimizers.Adam(learning_rate=5e-4),
checkpoint_subpath=checkpoint_dir,
max_gradient_norm=40.0,
architecture=architectures.CentralisedValueCritic,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# python3
# Copyright 2021 InstaDeep Ltd. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Example running MAPPO on Cooperative Atari Pong."""

import functools
from datetime import datetime
from typing import Any

import launchpad as lp
import numpy as np
from absl import app, flags
from acme.tf.networks import AtariTorso
from supersuit import dtype_v0

from mava.systems.tf import mappo
from mava.utils import lp_utils
from mava.utils.environments import pettingzoo_utils
from mava.utils.loggers import logger_utils

FLAGS = flags.FLAGS
flags.DEFINE_string(
"env_class",
"butterfly",
"Pettingzoo environment class, e.g. atari (str).",
)
flags.DEFINE_string(
"env_name",
"cooperative_pong_v3",
"Pettingzoo environment name, e.g. pong (str).",
)

flags.DEFINE_string(
"mava_id",
str(datetime.now()),
"Experiment identifier that can be used to continue experiments.",
)
flags.DEFINE_string("base_dir", "~/mava", "Base dir to store experiments.")


def main(_: Any) -> None:
"""Run example."""

# Environment
environment_factory = functools.partial(
pettingzoo_utils.make_environment,
env_class=FLAGS.env_class,
env_name=FLAGS.env_name,
env_preprocess_wrappers=[(dtype_v0, {"dtype": np.float32})],
)

# Networks.
network_factory = lp_utils.partial_kwargs(
mappo.make_default_networks, observation_network=AtariTorso()
)

# Checkpointer appends "Checkpoints" to checkpoint_dir
checkpoint_dir = f"{FLAGS.base_dir}/{FLAGS.mava_id}"

# Log every [log_every] seconds.
log_every = 10
logger_factory = functools.partial(
logger_utils.make_logger,
directory=FLAGS.base_dir,
to_terminal=True,
to_tensorboard=True,
time_stamp=FLAGS.mava_id,
time_delta=log_every,
)

# Distributed program
program = mappo.MAPPO(
environment_factory=environment_factory,
network_factory=network_factory,
logger_factory=logger_factory,
num_executors=1,
checkpoint_subpath=checkpoint_dir,
num_epochs=5,
batch_size=32,
).build()

# Ensure only trainer runs on gpu, while other processes run on cpu.
local_resources = lp_utils.to_device(
program_nodes=program.groups.keys(), nodes_on_gpu=["trainer"]
)

# Launch.
lp.launch(
program,
lp.LaunchType.LOCAL_MULTI_PROCESSING,
terminal="current_terminal",
local_resources=local_resources,
)


if __name__ == "__main__":
app.run(main)
Loading

0 comments on commit 25d87f0

Please sign in to comment.