Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline Reinforcement Learning (NeurIPS 2023)
TSRL (https://arxiv.org/abs/2306.04220) introduces a new offline reinforcement learning (RL) algorithm that leverages the fundamental symmetry of system dynamics to enhance performance under small datasets. The proposed Time-reversal symmetry (T-symmetry) enforced Dynamics Model (TDM) establishes consistency between forward and reverse latent dynamics, providing well-behaved representations for small datasets. TSRL achieves impressive performance on small benchmark datasets with as few as 1% of the original samples, outperforming recent offline RL algorithms in terms of data efficiency and generalizability.
To install the dependencies, use
pip install -r requirements.txt
Before start trainig, you should create small samples by yourself:
bash utils/generate_loco.sh # For the locomotion tasks
and
bash utils/generate_adroit.sh # For the adroit tasks
You can train TDM simply from:
bash TDM/train_loco.sh # For the locomotion tasks
and
bash TDM/train_adroit.sh # For the adroit tasks
After you have your own small samples as well as a trained TDM model, you can run TSRL on D4RL tasks by:
bash tsrl_loco.sh # For the locomotion tasks
and
bash tsrl_adroit.sh # For the locomotion tasks
You can resort to wandb to login your personal account via export your own wandb api key.
export WANDB_API_KEY=YOUR_WANDB_API_KEY
and run
wandb online
to turn on the online syncronization.