SB3-Contrib v2.3.0: New defaults hyperparameters for QR-DQN

araffin released this 31 Mar 18:41

· 11 commits to master since this release

Breaking Changes:

Upgraded to Stable-Baselines3 >= 2.3.0
The default learning_starts parameter of QRDQN have been changed to be consistent with the other offpolicy algorithms

# SB3 < 2.3.0 default hyperparameters, 50_000 corresponded to Atari defaults hyperparameters
# model = QRDQN("MlpPolicy", env, learning_starts=50_000)
# SB3 >= 2.3.0:
model = QRDQN("MlpPolicy", env, learning_starts=100)

New Features:

Added rollout_buffer_class and rollout_buffer_kwargs arguments to MaskablePPO
Log success rate rollout/success_rate when available for on policy algorithms

Others:

Fixed train_freq type annotation for tqc and qrdqn (@Armandpl)
Fixed sb3_contrib/common/maskable/*.py type annotations
Fixed sb3_contrib/ppo_mask/ppo_mask.py type annotations
Fixed sb3_contrib/common/vec_env/async_eval.py type annotations

Documentation:

Add some additional notes about MaskablePPO (evaluation and multi-process) (@icheered)

Full Changelog: v2.2.1...v2.3.0

Contributors

Armandpl and icheered

Assets 2