-
Notifications
You must be signed in to change notification settings - Fork 10
Soft Actor Critic (SAC)
Seungjae Ryan Lee edited this page Mar 30, 2019
·
2 revisions
SAC (Haarnoja et al., 2018a) incorporates maximum entropy reinforcment learning, where the agent's goal is to maximize expected reward and entropy concurrently. Combined with TD3, SAC achieves state of the art performance in various continuous control tasks. SAC has been extended to allow automatically tuning of the temperature parameter (Haarnoja et al., 2018b), which determines the importance of entropy against the expected reward.
- Example Script on LunarLander
- ArXiv Preprint (Original SAC)
- ArXiv Preprint (SAC with autotuned temperature)