SB3-Contrib v2.3.0: New defaults hyperparameters for QR-DQN
Breaking Changes:
- Upgraded to Stable-Baselines3 >= 2.3.0
- The default
learning_starts
parameter ofQRDQN
have been changed to be consistent with the other offpolicy algorithms
# SB3 < 2.3.0 default hyperparameters, 50_000 corresponded to Atari defaults hyperparameters
# model = QRDQN("MlpPolicy", env, learning_starts=50_000)
# SB3 >= 2.3.0:
model = QRDQN("MlpPolicy", env, learning_starts=100)
New Features:
- Added
rollout_buffer_class
androllout_buffer_kwargs
arguments to MaskablePPO - Log success rate
rollout/success_rate
when available for on policy algorithms
Others:
- Fixed
train_freq
type annotation for tqc and qrdqn (@Armandpl) - Fixed
sb3_contrib/common/maskable/*.py
type annotations - Fixed
sb3_contrib/ppo_mask/ppo_mask.py
type annotations - Fixed
sb3_contrib/common/vec_env/async_eval.py
type annotations
Documentation:
- Add some additional notes about
MaskablePPO
(evaluation and multi-process) (@icheered)
Full Changelog: v2.2.1...v2.3.0