Merge pull request #465 from instadeepai/develop

Feature / Release 0.1.2
instadeepai · Mar 25, 2022 · 25d87f0 · 25d87f0
2 parents 424025b + 6161758
commit 25d87f0
Show file tree

Hide file tree

Showing 34 changed files with 1,360 additions and 171 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bugfix_internal.md b/.github/ISSUE_TEMPLATE/bugfix_internal.md
@@ -12,10 +12,10 @@ A clear and concise description of what the bug is.
 
 ### To Reproduce
 Steps to reproduce the behavior:
-1. 
-2. 
-3. 
-4. 
+1.
+2.
+3.
+4.
 
 ### Expected behavior
 A clear and concise description of what you expected to happen.

diff --git a/.github/ISSUE_TEMPLATE/investigation_internal.md b/.github/ISSUE_TEMPLATE/investigation_internal.md
@@ -0,0 +1,29 @@
+---
+name: Investigation
+about: Outline the structure for an investigation. This would commonly be used to measure the impact of various design/implementation choices.
+title: '[INVESTIGATION]'
+labels: investigation
+assignees: ''
+
+---
+
+### What do you want to investigate?
+A brief description of what you would like to investigate. Do you have a hypothesis?
+
+### Definition of done
+A precise outline for the investigation to be considered complete.
+
+### [***Optional***] Results
+<!-- This is added after the investigation is complete.  -->
+Results from experiments/derivations. This could be linked to a benchmarking issue.
+
+### What was the conclusion of your investigation?
+<!-- This is added after the investigation is complete.  -->
+- What are the findings from the investigation?
+- Was your hypothesis correct?
+
+### [***Optional***] Discussion/Future Investigations
+<!-- This is added after the investigation is complete.  -->
+This could be a link to a Github [discussions page](https://github.com/instadeepai/Mava/discussions).
+
+<!-- Base checklist. Don’t hesitate to adapt it to your use-case. -->
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -48,7 +48,7 @@ repos:
         additional_dependencies: [flake8-isort]
 
   - repo: https://github.com/pre-commit/mirrors-mypy
-    rev: v0.910
+    rev: v0.941
     hooks:
       - id: mypy
         exclude: ^docs/

diff --git a/examples/README.md b/examples/README.md
@@ -11,7 +11,7 @@ We include a number of systems running on continuous control tasks.
 - **MADDPG**:
     a MADDPG system running on the continuous action space simple_spread MPE environment.
   - *Feedforward*:
-    - decentralised
+    - Decentralised
       - [decentralised][debug_maddpg_ff_dec]
       - [decentralised record agents][debug_maddpg_ff_dec_record]  (***recording agents acting in the environment***)
       - [decentralised executor scaling][debug_maddpg_ff_dec_scaling_executors] (***scaling to 4 executors***)
@@ -20,7 +20,7 @@ We include a number of systems running on continuous control tasks.
       - [decentralised lr scheduling][debug_maddpg_ff_dec_lr_scheduling](***using lr schedule***)
       - [decentralised evaluator intervals][debug_maddpg_ff_dec_eval_intervals](***running the evaluation loop at intervals***)
 
-  - [centralised][debug_maddpg_cen] , [networked][debug_maddpg_networked] (***using a fully-connected, networked architecture***), [networked with custom architecture][debug_maddpg_networked_custom] (***using a custom, sparse, networked architecture***) and [state_based][debug_maddpg_state_based].
+    - [centralised][debug_maddpg_cen] , [networked][debug_maddpg_networked] (***using a fully-connected, networked architecture***), [networked with custom architecture][debug_maddpg_networked_custom] (***using a custom, sparse, networked architecture***) and [state_based][debug_maddpg_state_based].
 
   - *Recurrent*
     - [decentralised][debug_maddpg_rec_dec] and [state_based][debug_maddpg_state_based].
@@ -45,17 +45,19 @@ We include a number of systems running on continuous control tasks.
 - **MAD4PG**:
       a MAD4PG system running on the Multiwalker environment.
   - *Feedforward*
-    - [decentralised][pz_mad4pg_ff_dec] and [decentralised record agents][pz_mad4pg_ff_dec_record] (***recording agents acting in the environment***).
+    - [decentralised][pz_mad4pg_ff_dec]
+    - [decentralised record agents][pz_mad4pg_ff_dec_record] (***recording agents acting in the environment***).
 
-  - **MAPPO**
-      - *Feedforward*
+- **MAPPO**
+    - *Feedforward*
         - [decentralised][pz_mappo_ff_dec].
 
 ### 2D RoboCup
 
 - **MAD4PG**:
     a MAD4PG system running on the RoboCup environment.
-  - *Recurrent* [state_based][robocup_mad4pg_ff_state_based].
+  - *Recurrent*
+    - [state_based][robocup_mad4pg_ff_state_based].
 
 ## Discrete control
 
@@ -71,29 +73,42 @@ We also include a number of systems running on discrete action space environment
 - **MADQN**:
       a MADQN system running on the discrete action space simple_spread MPE environment.
   - *Feedforward*
-    - [decentralised][debug_madqn_ff_dec], [decentralised lr scheduling][debug_madqn_ff_dec_lr_schedule] (***using lr schedule***), [decentralised custom lr scheduling][debug_madqn_ff_dec_custom_lr_schedule] (***using custom lr schedule***) and [decentralised custom epsilon decay scheduling][debug_madqn_ff_dec_custom_eps_schedule] (***using configurable epsilon scheduling***).
+    - Decentralised
+        - [decentralised][debug_madqn_ff_dec]
+        - [decentralised lr scheduling][debug_madqn_ff_dec_lr_schedule] (***using lr schedule***)
+        - [decentralised custom lr scheduling][debug_madqn_ff_dec_custom_lr_schedule] (***using custom lr schedule***)
+        - [decentralised custom epsilon decay scheduling][debug_madqn_ff_dec_custom_eps_schedule] (***using configurable epsilon scheduling***).
   - *Recurrent*
     - [decentralised][debug_madqn_rec_dec].
 
 - **VDN**:
       a VDN system running on the discrete action space simple_spread MPE environment.
-  - *Recurrent* [centralised][debug_vdn_rec_cen].
+  - *Recurrent*
+    - [centralised][debug_vdn_rec_cen].
 
 ### PettingZoo - Multi-Agent Atari
 
 - **MADQN**:
    a MADQN system running on the two-player competitive Atari Pong environment.
-  - *Recurrent* [decentralised][pz_madqn_pong_ff_dec].
+  - *Recurrent*
+    - [decentralised][pz_madqn_pong_rec_dec].
+
+- **MAPPO**:
+    a MAPPO system running on two-player cooperative Atari Pong.
+    - *feedforward*
+        - [decentralised][pz_mappo_coop_pong_ff_dec].
 
 ### PettingZoo - Multi-Agent Particle Environment
 
 - **MADDPG**:
       a MADDPG system running on the Simple Speaker Listener environment.
-  - *Feedforward* [decentralised][pz_maddpg_mpe_ssl_ff_dec].
+  - *Feedforward*
+    - [decentralised][pz_maddpg_mpe_ssl_ff_dec].
 
 - **MADDPG**:
       a MADDPG system running on the Simple Spread environment.
-  - *Feedforward* [decentralised][pz_maddpg_mpe_ss_ff_dec].
+  - *Feedforward*
+    - [decentralised][pz_maddpg_mpe_ss_ff_dec].
 
 ### SMAC - StarCraft Multi-Agent Challenge
 
@@ -106,17 +121,20 @@ We also include a number of systems running on discrete action space environment
 
 - **QMIX**:
     a QMIX system running on the SMAC environment.
-  - *Recurrent* [centralised][smac_qmix_rec_cen].
+  - *Recurrent*
+    - [centralised][smac_qmix_rec_cen].
 
 - **VDN**:
     a VDN system running on the SMAC environment.
-  - *Recurrent* [centralised][smac_vdn_rec_cen].
+  - *Recurrent*
+    - [centralised][smac_vdn_rec_cen].
 
 ### OpenSpiel - Tic Tac Toe
 
 - **MADQN**:
       a MADQN system running on the OpenSpiel environment.
-  - *Feedforward* [decentralised][openspiel_madqn_ff_dec].
+  - *Feedforward*
+    - [decentralised][openspiel_madqn_ff_dec].
 
 <!-- Examples -->
 [quickstart]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/quickstart.ipynb
@@ -151,6 +169,7 @@ We also include a number of systems running on discrete action space environment
 [pz_mad4pg_ff_dec_record]: https://github.com/instadeepai/Mava/blob/develop/examples/petting_zoo/sisl/multiwalker/feedforward/decentralised/run_mad4pg_record.py
 [pz_mappo_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/petting_zoo/sisl/multiwalker/feedforward/decentralised/run_mappo.py
 
+[robocup_mad4pg_ff_state_based]:https://github.com/instadeepai/Mava/blob/develop/examples/tf/robocup/recurrent/state_based/run_mad4pg.py
 <!-- Discrete -->
 [debug_mappo_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/debugging/simple_spread/feedforward/decentralised/run_mappo.py
 [debug_mappo_ff_cen]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/debugging/simple_spread/feedforward/centralised/run_mappo.py
@@ -163,12 +182,15 @@ We also include a number of systems running on discrete action space environment
 
 [debug_vdn_rec_cen]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/debugging/simple_spread/recurrent/centralised/run_vdn.py
 
-[pz_madqn_pong_rec_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/petting_zoo/atari/pong/recurrent/centralised/run_madqn.py
+[pz_madqn_pong_rec_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/petting_zoo/atari/pong/recurrent/decentralised/run_madqn.py
+
+[pz_mappo_coop_pong_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/petting_zoo/butterfly/cooperative_pong/feedforward/decentralised/run_mappo.py
 
 [pz_maddpg_mpe_ssl_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/petting_zoo/mpe/simple_speaker_listener/feedforward/decentralised/run_maddpg.py
 
 [pz_maddpg_mpe_ss_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/petting_zoo/mpe/simple_spread/feedforward/decentralised/run_maddpg.py
 
+[smac_madqn_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/smac/feedforward/decentralised/run_madqn.py
 [smac_madqn_rec_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/smac/recurrent/decentralised/run_madqn.py
 
 [smac_qmix_rec_cen]: https://github.com/instadeepai/Mava/blob/develop/examples/tf/smac/recurrent/centralised/run_qmix.py

diff --git a/examples/tf/debugging/simple_spread/feedforward/centralised/run_mappo.py b/examples/tf/debugging/simple_spread/feedforward/centralised/run_mappo.py
@@ -88,7 +88,8 @@ def main(_: Any) -> None:
         network_factory=network_factory,
         logger_factory=logger_factory,
         num_executors=1,
-        optimizer=snt.optimizers.Adam(learning_rate=5e-4),
+        policy_optimizer=snt.optimizers.Adam(learning_rate=5e-4),
+        critic_optimizer=snt.optimizers.Adam(learning_rate=5e-4),
         checkpoint_subpath=checkpoint_dir,
         max_gradient_norm=40.0,
         architecture=architectures.CentralisedValueCritic,

diff --git a/examples/tf/petting_zoo/butterfly/cooperative_pong/feedforward/decentralised/run_mappo.py b/examples/tf/petting_zoo/butterfly/cooperative_pong/feedforward/decentralised/run_mappo.py
@@ -0,0 +1,108 @@
+# python3
+# Copyright 2021 InstaDeep Ltd. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Example running MAPPO on Cooperative Atari Pong."""
+
+import functools
+from datetime import datetime
+from typing import Any
+
+import launchpad as lp
+import numpy as np
+from absl import app, flags
+from acme.tf.networks import AtariTorso
+from supersuit import dtype_v0
+
+from mava.systems.tf import mappo
+from mava.utils import lp_utils
+from mava.utils.environments import pettingzoo_utils
+from mava.utils.loggers import logger_utils
+
+FLAGS = flags.FLAGS
+flags.DEFINE_string(
+    "env_class",
+    "butterfly",
+    "Pettingzoo environment class, e.g. atari (str).",
+)
+flags.DEFINE_string(
+    "env_name",
+    "cooperative_pong_v3",
+    "Pettingzoo environment name, e.g. pong (str).",
+)
+
+flags.DEFINE_string(
+    "mava_id",
+    str(datetime.now()),
+    "Experiment identifier that can be used to continue experiments.",
+)
+flags.DEFINE_string("base_dir", "~/mava", "Base dir to store experiments.")
+
+
+def main(_: Any) -> None:
+    """Run example."""
+
+    # Environment
+    environment_factory = functools.partial(
+        pettingzoo_utils.make_environment,
+        env_class=FLAGS.env_class,
+        env_name=FLAGS.env_name,
+        env_preprocess_wrappers=[(dtype_v0, {"dtype": np.float32})],
+    )
+
+    # Networks.
+    network_factory = lp_utils.partial_kwargs(
+        mappo.make_default_networks, observation_network=AtariTorso()
+    )
+
+    # Checkpointer appends "Checkpoints" to checkpoint_dir
+    checkpoint_dir = f"{FLAGS.base_dir}/{FLAGS.mava_id}"
+
+    # Log every [log_every] seconds.
+    log_every = 10
+    logger_factory = functools.partial(
+        logger_utils.make_logger,
+        directory=FLAGS.base_dir,
+        to_terminal=True,
+        to_tensorboard=True,
+        time_stamp=FLAGS.mava_id,
+        time_delta=log_every,
+    )
+
+    # Distributed program
+    program = mappo.MAPPO(
+        environment_factory=environment_factory,
+        network_factory=network_factory,
+        logger_factory=logger_factory,
+        num_executors=1,
+        checkpoint_subpath=checkpoint_dir,
+        num_epochs=5,
+        batch_size=32,
+    ).build()
+
+    # Ensure only trainer runs on gpu, while other processes run on cpu.
+    local_resources = lp_utils.to_device(
+        program_nodes=program.groups.keys(), nodes_on_gpu=["trainer"]
+    )
+
+    # Launch.
+    lp.launch(
+        program,
+        lp.LaunchType.LOCAL_MULTI_PROCESSING,
+        terminal="current_terminal",
+        local_resources=local_resources,
+    )
+
+
+if __name__ == "__main__":
+    app.run(main)
diff --git a/...er/feedforward/decentralised/run_mappo.py → ...er/feedforward/decentralised/run_mappo.py b/...er/feedforward/decentralised/run_mappo.py → ...er/feedforward/decentralised/run_mappo.py