Soft Actor Critic (SAC)

SAC (Haarnoja et al., 2018a) incorporates maximum entropy reinforcment learning, where the agent's goal is to maximize expected reward and entropy concurrently. Combined with TD3, SAC achieves state of the art performance in various continuous control tasks. SAC has been extended to allow automatically tuning of the temperature parameter (Haarnoja et al., 2018b), which determines the importance of entropy against the expected reward.

Example Script on LunarLander
ArXiv Preprint (Original SAC)
ArXiv Preprint (SAC with autotuned temperature)

Welcome to KAIR wiki! You are always welcome to improve the wiki.

Here are some suggestions:

Populate stub pages
Write tutorials & guides
Specify versions of each software module & specs of each hardware component
Add links to external resources
Fix typos

KAIR

RL Algorithms

SAC
TD3
fD
HER
PER

Simulator

OpenManipulator

Setup
Default Controller
Demo Controller

Sim2Real

Domain Randomization

Misc

Docker

Provide feedback

Saved searches