This is a collection of research and review papers for offline reinforcement learning (offline rl). Feel free to star and fork.
Maintainers:
- Haruka Kiyohara (Tokyo Institute of Technology / Hanjuku-kaso Co., Ltd.)
- Yuta Saito (Hanjuku-kaso Co., Ltd. / Cornell University)
We are looking for more contributors and maintainers! Please feel free to pull requests.
format:
- [title](paper link) [links]
- author1, author2, and author3. arXiv/conferences/journals/, year.
For any question, feel free to contact: saito@hanjuku-kaso.com
- Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
- Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. arXiv, 2020.
- Accelerating Offline Reinforcement Learning Application in Real-Time Bidding and Recommendation: Potential Use of Simulation
- Haruka Kiyohara, Kosuke Kawakami, and Yuta Saito. arXiv, 2021.
- Lifelong Robotic Reinforcement Learning by Retaining Experiences [website]
- Annie Xie and Chelsea Finn. arXiv, 2021.
- Dual Behavior Regularized Reinforcement Learning
- Chapman Siu, Jason Traish, and Richard Yi Da Xu. arXiv, 2021.
- Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
- Tianhe Yu, Aviral Kumar, Yevgen Chebotar, Karol Hausman, Sergey Levine, and Chelsea Finn. arXiv, 2021.
- DCUR: Data Curriculum for Teaching via Samples with Reinforcement Learning [website] [code]
- Daniel Seita, Abhinav Gopal, Zhao Mandi, and John Canny. arXiv, 2021.
- DROMO: Distributionally Robust Offline Model-based Policy Optimization
- Ruizhen Liu, Dazhi Zhong, and Zhicong Chen. arXiv, 2021.
- Implicit Behavioral Cloning
- Pete Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian Wong, Johnny Lee, Igor Mordatch, and Jonathan Tompson. arXiv, 2021.
- Nearly Horizon-Free Offline Reinforcement Learning
- Tongzheng Ren, Jialian Li, Bo Dai, Simon S. Du, and Sujay Sanghavi. arXiv, 2021.
- Reducing Conservativeness Oriented Offline Reinforcement Learning
- Hongchang Zhang, Jianzhun Shao, Yuhang Jiang, Shuncheng He, and Xiangyang Ji. arXiv, 2021.
- Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
- Andrea Zanette, Martin J. Wainwright, and Emma Brunskill. arXiv, 2021.
- Policy Gradients Incorporating the Future
- David Venuto, Elaine Lau, Doina Precup, and Ofir Nachum. arXiv, 2021.
- Offline Decentralized Multi-Agent Reinforcement Learning
- Jiechuan Jiang and Zongqing Lu. arXiv, 2021.
- OPAL: Offline Preference-Based Apprenticeship Learning [website]
- Daniel Shin and Daniel S. Brown. arXiv, 2021.
- Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning
- Haoran Xu, Xianyuan Zhan, and Xiangyu Zhu. arXiv, 2021.
- Pessimistic Model-based Offline RL: PAC Bounds and Posterior Sampling under Partial Coverage
- Masatoshi Uehara and Wen Sun. arXiv, 2021.
- Conservative Offline Distributional Reinforcement Learning
- Yecheng Jason Ma, Dinesh Jayaraman, and Osbert Bastani. arXiv, 2021.
- Offline Meta-Reinforcement Learning with Online Self-Supervision
- Vitchyr H. Pong, Ashvin Nair, Laura Smith, Catherine Huang, and Sergey Levine. arXiv, 2021.
- Where is the Grass Greener? Revisiting Generalized Policy Iteration for Offline Reinforcement Learning
- Lionel Blondé and Alexandros Kalousis. arXiv, 2021.
- The Least Restriction for Offline Reinforcement Learning
- Zizhou Su. arXiv, 2021.
- Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble
- Seunghyun Lee, Younggyo Seo, Kimin Lee, Pieter Abbeel, and Jinwoo Shin. arXiv, 2021.
- Causal Reinforcement Learning using Observational and Interventional Data
- Maxime Gasse, Damien Grasset, Guillaume Gaudron, and Pierre-Yves Oudeyer. arXiv, 2021.
- On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data
- Chenjun Xiao, Ilbin Lee, Bo Dai, Dale Schuurmans, and Csaba Szepesvari. arXiv, 2021.
- Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL [website]
- Catherine Cang, Aravind Rajeswaran, Pieter Abbeel, and Michael Laskin. arXiv, 2021.
- Offline RL Without Off-Policy Evaluation
- David Brandfonbrener, William F. Whitney, Rajesh Ranganath, and Joan Bruna. arXiv, 2021.
- On Multi-objective Policy Optimization as a Tool for Reinforcement Learning
- Abbas Abdolmaleki, Sandy H. Huang, Giulia Vezzani, Bobak Shahriari, Jost Tobias Springenberg, Shruti Mishra, Dhruva TB, Arunkumar Byravan, Konstantinos Bousmalis, Andras Gyorgy, Csaba Szepesvari, Raia Hadsell, Nicolas Heess, and Martin Riedmiller. arXiv, 2021.
- A Minimalist Approach to Offline Reinforcement Learning
- Scott Fujimoto and Shixiang Shane Gu. arXiv, 2021.
- Offline Reinforcement Learning as Anti-Exploration
- Shideh Rezaeifar, Robert Dadashi, Nino Vieillard, Léonard Hussenot, Olivier Bachem, Olivier Pietquin, and Matthieu Geist. arXiv, 2021.
- Corruption-Robust Offline Reinforcement Learning
- Xuezhou Zhang, Yiding Chen, Jerry Zhu, and Wen Sun. arXiv, 2021.
- Bellman-consistent Pessimism for Offline Reinforcement Learning
- Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, and Alekh Agarwal. arXiv, 2021.
- Offline Inverse Reinforcement Learning
- Firas Jarboui and Vianney Perchet. arXiv, 2021.
- Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning
- Tengyang Xie, Nan Jiang, Huan Wang, Caiming Xiong, and Yu Bai. arXiv, 2021.
- Heuristic-Guided Reinforcement Learning
- Ching-An Cheng, Andrey Kolobov, and Adith Swaminathan. arXiv, 2021.
- Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning
- Yiqin Yang, Xiaoteng Ma, Chenghao Li, Zewu Zheng, Qiyuan Zhang, Gao Huang, Jun Yang, and Qianchuan Zhao. arXiv, 2021.
- Reinforcement Learning as One Big Sequence Modeling Problem
- Michael Janner, Qiyang Li, and Sergey Levine. arXiv, 2021.
- Decision Transformer: Reinforcement Learning via Sequence Modeling
- Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. arXiv, 2021.
- Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs
- Harsh Satija, Philip S. Thomas, Joelle Pineau, and Romain Laroche. arXiv, 2021.
- Model-Based Offline Planning with Trajectory Pruning
- Xianyuan Zhan, Xiangyu Zhu, and Haoran Xu. arXiv, 2021.
- InferNet for Delayed Reinforcement Tasks: Addressing the Temporal Credit Assignment Problem
- Markel Sanz Ausin, Hamoon Azizsoltani, Song Ju, Yeo Jin Kim, and Min Chi. arXiv, 2021.
- Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm [video]
- Lin Chen, Bruno Scherrer, and Peter L. Bartlett. arXiv, 2021.
- MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale [website]
- Dmitry Kalashnikov, Jacob Varley, Yevgen Chebotar, Benjamin Swanson, Rico Jonschkowski, Chelsea Finn, Sergey Levine, and Karol Hausman. arXiv, 2021.
- Online and Offline Reinforcement Learning by Planning with a Learned Model
- Julian Schrittwieser, Thomas Hubert, Amol Mandhane, Mohammadamin Barekatain, Ioannis Antonoglou, and David Silver. arXiv, 2021.
- Distributional Offline Continuous-Time Reinforcement Learning with Neural Physics-Informed PDEs (SciPhy RL for DOCTR-L)
- Igor Halperin. arXiv, 2021.
- Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism [video]
- Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, and Stuart Russell. arXiv, 2021.
- Regularized Behavior Value Estimation
- Caglar Gulcehre, Sergio Gómez Colmenarejo, Ziyu Wang, Jakub Sygnowski, Thomas Paine, Konrad Zolna, Yutian Chen, Matthew Hoffman, Razvan Pascanu, and Nando de Freitas. arXiv, 2021.
- Causal-aware Safe Policy Improvement for Task-oriented dialogue
- Govardana Sachithanandam Ramachandran, Kazuma Hashimoto, and Caiming Xiong. arXiv, 2021.
- Improved Context-Based Offline Meta-RL with Attention and Contrastive Learning
- Lanqing Li, Yuanhao Huang, and Dijun Luo. arXiv, 2021.
- Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning
- Luofeng Liao, Zuyue Fu, Zhuoran Yang, Mladen Kolar, and Zhaoran Wang. arXiv, 2021.
- GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning
- Guy Tennenholtz, Nir Baram, and Shie Mannor. arXiv, 2021.
- MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning
- DiJia Su, Jason D. Lee, John M. Mulvey, and H. Vincent Poor. arXiv, 2021.
- Continuous Doubly Constrained Batch Reinforcement Learning
- Rasool Fakoor, Jonas Mueller, Pratik Chaudhari, and Alexander J. Smola. arXiv, 2021.
- COMBO: Conservative Offline Model-Based Policy Optimization
- Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, and Chelsea Finn. arXiv, 2021.
- Q-Value Weighted Regression: Reinforcement Learning with Limited Data
- Piotr Kozakowski, Łukasz Kaiser, Henryk Michalewski, Afroz Mohiuddin, and Katarzyna Kańska. arXiv, 2021.
- PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators
- Anish Agarwal, Abdullah Alomar, Varkey Alumootil, Devavrat Shah, Dennis Shen, Zhi Xu, and Cindy Yang. arXiv, 2021.
- Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency
- Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, and Tengyang Xie. arXiv, 2021.
- Fast Rates for the Regret of Offline Reinforcement Learning
- Yichun Hu, Nathan Kallus, and Masatoshi Uehara. arXiv, 2021.
- Near-Optimal Offline Reinforcement Learning via Double Variance Reduction
- Ming Yin, Yu Bai, and Yu-Xiang Wang. arXiv, 2021.
- Identifying Decision Points for Safe and Interpretable Reinforcement Learning in Hypotension Treatment
- Kristine Zhang, Yuanheng Wang, Jianzhun Du, Brian Chu, Leo Anthony Celi, Ryan Kindle, and Finale Doshi-Velez. arXiv, 2021.
- EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
- Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, and Shixiang Shane Gu. ICML, 2021.
- Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills [website]
- Yevgen Chebotar, Karol Hausman, Yao Lu, Ted Xiao, Dmitry Kalashnikov, Jake Varley, Alex Irpan, Benjamin Eysenbach, Ryan Julian, Chelsea Finn, and Sergey Levine. ICML, 2021.
- Is Pessimism Provably Efficient for Offline RL? [video]
- Ying Jin, Zhuoran Yang, and Zhaoran Wang. ICML, 2021.
- Representation Matters: Offline Pretraining for Sequential Decision Making
- Mengjiao Yang and Ofir Nachum. ICML, 2021.
- Offline Reinforcement Learning with Pseudometric Learning
- Robert Dadashi, Shideh Rezaeifar, Nino Vieillard, Léonard Hussenot, Olivier Pietquin, and Matthieu Geist. ICML, 2021.
- Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment
- Philip J. Ball, Cong Lu, Jack Parker-Holder, and Stephen Roberts. ICML, 2021.
- Offline Contextual Bandits with Overparameterized Models
- David Brandfonbrener, William F. Whitney, Rajesh Ranganath and Joan Bruna. ICML, 2021.
- Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning
- Yaqi Duan, Chi Jin, and Zhiyuan Li. ICML, 2021.
- Offline Reinforcement Learning with Fisher Divergence Critic Regularization
- Ilya Kostrikov, Jonathan Tompson, Rob Fergus, and Ofir Nachum. ICML, 2021.
- OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation
- Jongmin Lee, Wonseok Jeon, Byung-Jun Lee, Joelle Pineau, and Kee-Eung Kim. ICML, 2021.
- Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
- Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua Susskind, Jian Zhang, Ruslan Salakhutdinov, and Hanlin Goh. ICML, 2021.
- Vector Quantized Models for Planning
- Sherjil Ozair, Yazhe Li, Ali Razavi, Ioannis Antonoglou, Aäron van den Oord, and Oriol Vinyals. ICML, 2021.
- Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL [video]
- Andrea Zanette. ICML, 2021.
- Instabilities of Offline RL with Pre-Trained Neural Representation
- Ruosong Wang, Yifan Wu, Ruslan Salakhutdinov, and Sham M. Kakade. ICML, 2021.
- Offline Meta-Reinforcement Learning with Advantage Weighting
- Eric Mitchell, Rafael Rafailov, Xue Bin Peng, Sergey Levine, and Chelsea Finn. ICML, 2021.
- Model-Based Offline Planning [video]
- Arthur Argenson and Gabriel Dulac-Arnold. ICLR, 2021.
- Batch Reinforcement Learning Through Continuation Method
- Yijie Guo, Shengyu Feng, Nicolas Le Roux, Ed Chi, Honglak Lee, and Minmin Chen. ICLR, 2021.
- Model-Based Visual Planning with Self-Supervised Functional Distances
- Stephen Tian, Suraj Nair, Frederik Ebert, Sudeep Dasari, Benjamin Eysenbach, Chelsea Finn, and Sergey Levine. ICLR, 2021.
- Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
- Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, and Shixiang Gu. ICLR, 2021.
- Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization
- Lanqing Li, Rui Yang, and Dijun Luo. ICLR, 2021.
- DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs
- Aayam Kumar Shrestha, Stefan Lee, Prasad Tadepalli, and Alan Fern. ICLR, 2021.
- What are the Statistical Limits of Offline RL with Linear Function Approximation? [video]
- Ruosong Wang, Dean Foster, and Sham M. Kakade. ICLR, 2021.
- Reset-Free Lifelong Learning with Skill-Space Planning [website]
- Kevin Lu, Aditya Grover, Pieter Abbeel, and Igor Mordatch. ICLR, 2021.
- Risk-Averse Offline Reinforcement Learning
- Núria Armengol Urpí, Sebastian Curi, and Andreas Krause. ICLR, 2021.
- Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning
- Zhengqing Zhou, Zhengyuan Zhou, Qinxun Bai, Linhai Qiu, Jose Blanchet, and Peter Glynn. AISTATS, 2021.
- Exploration by Maximizing Rényi Entropy for Reward-Free RL Framework
- Chuheng Zhang, Yuanying Cai, Longbo Huang, and Jian Li. AAAI, 2021.
- Efficient Self-Supervised Data Collection for Offline Robot Learning
- Shadi Endrawis, Gal Leibovich, Guy Jacob, Gal Novik, Aviv Tamar. ICRA, 2021.
- S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning
- Samarth Sinha and Animesh Garg. CoRL, 2021.
- A Workflow for Offline Model-Free Robotic Reinforcement Learning [website]
- Aviral Kumar, Anikait Singh, Stephen Tian, Chelsea Finn, and Sergey Levine. CoRL, 2021.
- Boosting Offline Reinforcement Learning with Residual Generative Modeling
- Hua Wei, Deheng Ye, Zhao Liu, Hao Wu, Bo Yuan, Qiang Fu, Wei Yang, and Zhenhui (Jessie)Li. IJCAI, 2021.
- Behavior Constraining in Weight Space for Offline Reinforcement Learning
- Phillip Swazinna, Steffen Udluft, Daniel Hein, and Thomas Runkler. ESANN, 2021.
- Finite-Sample Analysis For Decentralized Batch Multi-Agent Reinforcement Learning With Networked Agents
- Kaiqing Zhang, Zhuoran Yang, Han Liu, Tong Zhang, and Tamer Başar. IEEE T AUTOMATIC CONTROL, 2021.
- AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [website] [code] [blog]
- Ashvin Nair, Abhishek Gupta, Murtaza Dalal, and Sergey Levine. arXiv, 2020.
- Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient
- Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, and Mengdi Wang. arXiv, 2020.
- A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting
- Philip Amortila, Nan Jiang, and Tengyang Xie. arXiv, 2020.
- Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient
- Samuele Tosatto, João Carvalho, and Jan Peters. arXiv, 2020.
- Batch Value-function Approximation with Only Realizability
- Tengyang Xie and Nan Jiang. arXiv2020.
- DRIFT: Deep Reinforcement Learning for Functional Software Testing
- Luke Harries, Rebekah Storan Clarke, Timothy Chapman, Swamy V. P. L. N. Nallamalli, Levent Ozgur, Shuktika Jain, Alex Leung, Steve Lim, Aaron Dietrich, José Miguel Hernández-Lobato, Tom Ellis, Cheng Zhang, and Kamil Ciosek. arXiv, 2020.
- Causality and Batch Reinforcement Learning: Complementary Approaches To Planning In Unknown Domains
- James Bannon, Brad Windsor, Wenbo Song, and Tao Li. arXiv, 2020.
- Goal-conditioned Batch Reinforcement Learning for Rotation Invariant Locomotion [code]
- Aditi Mavalankar. arXiv, 2020.
- Semi-Supervised Reward Learning for Offline Reinforcement Learning
- Ksenia Konyushkova, Konrad Zolna, Yusuf Aytar, Alexander Novikov, Scott Reed, Serkan Cabi, and Nando de Freitas. arXiv, 2020.
- Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation
- Chaochao Lu, Biwei Huang, Ke Wang, José Miguel Hernández-Lobato, Kun Zhang, and Bernhard Schölkopf. arXiv, 2020.
- Offline Reinforcement Learning from Images with Latent Space Models [website]
- Rafael Rafailov, Tianhe Yu, Aravind Rajeswaran, and Chelsea Finn. arXiv, 2020.
- POPO: Pessimistic Offline Policy Optimization
- Qiang He and Xinwen Hou. arXiv, 2020.
- Reinforcement Learning with Videos: Combining Offline Observations with Interaction
- Karl Schmeckpeper, Oleh Rybkin, Kostas Daniilidis, Sergey Levine, and Chelsea Finn. arXiv, 2020.
- Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones [website]
- Brijen Thananjeyan, Ashwin Balakrishna, Suraj Nair, Michael Luo, Krishnan Srinivasan, Minho Hwang, Joseph E. Gonzalez, Julian Ibarz, Chelsea Finn, and Ken Goldberg. arXiv, 2020.
- Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning
- Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, and Sergey Levine. arXiv, 2020.
- OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning [website]
- Anurag Ajay, Aviral Kumar, Pulkit Agrawal, Sergey Levine, and Ofir Nachum. arXiv, 2020.
- Batch Exploration with Examples for Scalable Robotic Reinforcement Learning
- Annie S. Chen, HyunJi Nam, Suraj Nair, and Chelsea Finn. arXiv, 2020.
- Learning Dexterous Manipulation from Suboptimal Experts [website]
- Rae Jeong, Jost Tobias Springenberg, Jackie Kay, Daniel Zheng, Yuxiang Zhou, Alexandre Galashov, Nicolas Heess, and Francesco Nori. arXiv, 2020.
- The Reinforcement Learning-Based Multi-Agent Cooperative Approach for the Adaptive Speed Regulation on a Metallurgical Pickling Line
- Anna Bogomolova, Kseniia Kingsep, and Boris Voskresenskii. arXiv, 2020.
- Overcoming Model Bias for Robust Offline Deep Reinforcement Learning
- Phillip Swazinna, Steffen Udluft, and Thomas Runkler. arXiv, 2020.
- Offline Meta Learning of Exploration
- Ron Dorfman, Idan Shenfeld, and Aviv Tamar. arXiv, 2020.
- EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
- Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, and Shixiang Shane Gu. arXiv, 2020.
- Hyperparameter Selection for Offline Reinforcement Learning
- Tom Le Paine, Cosmin Paduraru, Andrea Michi, Caglar Gulcehre, Konrad Zolna, Alexander Novikov, Ziyu Wang, and Nando de Freitas. arXiv, 2020.
- Interpretable Control by Reinforcement Learning
- Daniel Hein, Steffen Limmer, and Thomas A. Runkler. arXiv, 2020.
- Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning [code]
- Nathan Kallus and Masatoshi Uehara. arXiv, 2020.
- Accelerating Online Reinforcement Learning with Offline Datasets [website] [blog]
- Ashvin Nair, Murtaza Dalal, Abhishek Gupta, and Sergey Levine. arXiv, 2020.
- DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction [blog]
- Aviral Kumar, Abhishek Gupta, and Sergey Levine. arXiv, 2020.
- Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies
- Nathan Kallus and Masatoshi Uehara. NeurIPS, 2020.
- Critic Regularized Regression
- Ziyu Wang, Alexander Novikov, Konrad Zolna, Josh S. Merel, Jost Tobias Springenberg, Scott E. Reed, Bobak Shahriari, Noah Siegel, Caglar Gulcehre, Nicolas Heess, and Nando de Freitas. NeurIPS, 2020
- Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration
- Yao Liu, Adith Swaminathan, Alekh Agarwal, and Emma Brunskill. NeurIPS, 2020.
- Conservative Q-Learning for Offline Reinforcement Learning [website] [code] [blog]
- Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. NeurIPS, 2020.
- BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning
- Xinyue Chen, Zijian Zhou, Zheng Wang, Che Wang, Yanqiu Wu, and Keith Ross. NeurIPS, 2020.
- MOPO: Model-based Offline Policy Optimization [code]
- Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Y. Zou, Sergey Levine, Chelsea Finn, and Tengyu Ma. NeurIPS, 2020.
- MOReL: Model-Based Offline Reinforcement Learning [podcast]
- Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, and Thorsten Joachims. NeurIPS, 2020.
- Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation
- Aaron Sonabend, Junwei Lu, Leo Anthony Celi, Tianxi Cai, and Peter Szolovits. NeurIPS, 2020.
- Multi-task Batch Reinforcement Learning with Metric Learning
- Jiachen Li, Quan Vuong, Shuang Liu, Minghua Liu, Kamil Ciosek, Henrik Christensen, and Hao Su. NeurIPS, 2020.
- Counterfactual Data Augmentation using Locally Factored Dynamics [code]
- Silviu Pitis, Elliot Creager, and Animesh Garg. NeurIPS, 2020.
- On Reward-Free Reinforcement Learning with Linear Function Approximation
- Ruosong Wang, Simon S. Du, Lin Yang, and Russ R. Salakhutdinov. NeurIPS, 2020.
- Constrained Policy Improvement for Safe and Efficient Reinforcement Learning
- Elad Sarafian, Aviv Tamar, and Sarit Kraus. IJCAI, 2020.
- BRPO: Batch Residual Policy Optimization [code]
- Sungryull Sohn, Yinlam Chow, Jayden Ooi, Ofir Nachum, Honglak Lee, Ed Chi, and Craig Boutilier. IJCAI, 2020.
- From Importance Sampling to Doubly Robust Policy Gradient
- Jiawei Huang and Nan Jiang. ICML, 2020.
- Batch Stationary Distribution Estimation
- Junfeng Wen, Bo Dai, Lihong Li, and Dale Schuurmans. ICML, 2020.
- GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values
- Shangtong Zhang, Bo Liu, and Shimon Whiteson. ICML, 2020.
- GenDICE: Generalized Offline Estimation of Stationary Values
- Ruiyi Zhang, Bo Dai, Lihong Li, and Dale Schuurmans. ICLR, 2020.
- Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning
- Noah Siegel, Jost Tobias Springenberg, Felix Berkenkamp, Abbas Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess, and Martin Riedmiller. ICLR, 2020.
- COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning [website] [blog] [code]
- Avi Singh, Albert Yu, Jonathan Yang, Jesse Zhang, Aviral Kumar, and Sergey Levine. CoRL, 2020.
- Accelerating Reinforcement Learning with Learned Skill Priors
- Karl Pertsch, Youngwoon Lee, and Joseph J. Lim. CoRL, 2020.
- PLAS: Latent Action Space for Offline Reinforcement Learning
- Wenxuan Zhou, Sujay Bajracharya, David Held. CoRL, 2020.
- Scaling data-driven robotics with reward sketching and batch reinforcement learning [website]
- Serkan Cabi, Sergio Gómez Colmenarejo, Alexander Novikov, Ksenia Konyushkova, Scott Reed, Rae Jeong, Konrad Zolna, Yusuf Aytar, David Budden, Mel Vecerik, Oleg Sushkov, David Barker, Jonathan Scholz, Misha Denil, Nando de Freitas, and Ziyu Wang. RSS, 2020.
- Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping
- Cristian Bodnar, Adrian Li, Karol Hausman, Peter Pastor, and Mrinal Kalakrishnan. RSS, 2020.
- Defining Admissible Rewards for High Confidence Policy Evaluation in Batch Reinforcement Learning
- Niranjani Prasad, Barbara E Engelhardt, and Finale Doshi-Velez. CHIL, 2020.
- Learning When-to-Treat Policies
- Xinkun Nie, Emma Brunskill, and Stefan Wager. JASA, 2020.
- Batch-Constrained Reinforcement Learning for Dynamic Distribution Network Reconfiguration
- Yuanqi Gao, Wei Wang, Jie Shi, and Nanpeng Yu. IEEE T SMART GRID, 2020.
- Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
- Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, and Rosalind Picard. arXiv, 2019.
- Behavior Regularized Offline Reinforcement Learning
- Yifan Wu, George Tucker, and Ofir Nachum. arXiv, 2019.
- Off-Policy Policy Gradient Algorithms by Constraining the State Distribution Shift
- Riashat Islam, Komal K. Teru, Deepak Sharma, and Joelle Pineau. arXiv, 2019.
- Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
- Xue Bin Peng, Aviral Kumar, Grace Zhang, and Sergey Levine. arXiv, 2019.
- AlgaeDICE: Policy Gradient from Arbitrary Experience
- Ofir Nachum, Bo Dai, Ilya Kostrikov, Yinlam Chow, Lihong Li, and Dale Schuurmans. arXiv, 2019.
- Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction [website] [blog] [code]
- Aviral Kumar, Justin Fu, George Tucker, and Sergey Levine. NeurIPS, 2019.
- Off-Policy Deep Reinforcement Learning without Exploration
- Scott Fujimoto, David Meger, and Doina Precup. ICML, 2019.
- Safe Policy Improvement with Baseline Bootstrapping
- Romain Laroche, Paul Trichelair, and Remi Tachet Des Combes. ICML, 2019.
- Information-Theoretic Considerations in Batch Reinforcement Learning
- Jinglin Chen and Nan Jiang. ICML, 2019.
- Batch Recurrent Q-Learning for Backchannel Generation Towards Engaging Agents
- Nusrah Hussain, Engin Erzin, T. Metin Sezgin, and Yucel Yemez. ACII, 2019.
- Safe Policy Improvement with Soft Baseline Bootstrapping
- Kimia Nadjahi, Romain Laroche, and Rémi Tachet des Combes. ECML, 2019.
- Importance Weighted Transfer of Samples in Reinforcement Learning
- Andrea Tirinzoni, Andrea Sessa, Matteo Pirotta, and Marcello Restelli. ICML, 2018.
- Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation [website]
- Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent Vanhoucke, and Sergey Levine. CoRL, 2018.
- Off-Policy Policy Gradient with State Distribution Correction
- Yao Liu, Adith Swaminathan, Alekh Agarwal, and Emma Brunskill. UAI, 2018.
- Behavioral Cloning from Observation
- Faraz Torabi, Garrett Warnell, and Peter Stone. IJCAI, 2018.
- Deep Exploration via Bootstrapped DQN
- Ian Osband, Charles Blundell, Alexander Pritzel, and Benjamin Van Roy. NeurIPS, 2016.
- Safe Policy Improvement by Minimizing Robust Baseline Regret
- Mohammad Ghavamzadeh, Marek Petrik, and Yinlam Chow. NeurIPS, 2016.
- Residential Demand Response Applications Using Batch Reinforcement Learning
- Frederik Ruelens, Bert Claessens, Stijn Vandael, Bart De Schutter, Robert Babuska, and Ronnie Belmans. arXiv, 2015.
- Structural Return Maximization for Reinforcement Learning
- Joshua Joseph, Javier Velez, and Nicholas Roy. arXiv, 2014.
- Simultaneous Perturbation Algorithms for Batch Off-Policy Search
- Raphael Fonteneau, and L.A. Prashanth. CDC, 2014.
- Guided Policy Search
- Sergey Levine, and Vladlen Koltun. ICML, 2013.
- Off-Policy Actor-Critic
- Thomas Degris, Martha White, and Richard S. Sutton. ICML, 2012.
- PAC-Bayesian Policy Evaluation for Reinforcement Learning
- Mahdi MIlani Fard, Joelle Pineau, and Csaba Szepesvari. UAI, 2011.
- Tree-Based Batch Mode Reinforcement Learning
- Damien Ernst, Pierre Geurts, and Louis Wehenkel. JMLR, 2005.
- Neural Fitted Q Iteration–First Experiences with a Data Efficient Neural Reinforcement Learning Method
- Martin Riedmiller. ECML, 2005.
- Off-Policy Temporal-Difference Learning with Function Approximation
- Doina Precup, Richard S. Sutton, and Sanjoy Dasgupta. ICML, 2001.
- An Offline Deep Reinforcement Learning for Maintenance Decision-Making
- Hamed Khorasgani, Haiyan Wang, Chetan Gupta, and Ahmed Farahat. arXiv, 2021.
- Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning
- Sarah Rathnam, Susan A. Murphy, and Finale Doshi-Velez. arXiv, 2021.
- Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation
- Suraj Nair, Eric Mitchell, Kevin Chen, Brian Ichter, Silvio Savarese, and Chelsea Finn. arXiv, 2021.
- Offline-Online Reinforcement Learning for Energy Pricing in Office Demand Response: Lowering Energy and Data Costs
- Doseok Jang, Lucas Spangher, Manan Khattar, Utkarsha Agwan, Selvaprabuh Nadarajah, and Costas Spanos. arXiv, 2021.
- Offline reinforcement learning with uncertainty for treatment strategies in sepsis
- Ran Liu, Joseph L. Greenstein, James C. Fackler, Jules Bergmann, Melania M. Bembea, and Raimond L. Winslow. arXiv, 2021.
- Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL
- Bogdan Mazoure, Paul Mineiro, Pavithra Srinath, Reza Sharifi Sedeh, Doina Precup, and Adith Swaminathan. arXiv, 2021.
- Safe Model-based Off-policy Reinforcement Learning for Eco-Driving in Connected and Automated Hybrid Electric Vehicles
- Zhaoxuan Zhu, Nicola Pivaro, Shobhit Gupta, Abhishek Gupta, and Marcello Canova. arXiv, 2021.
- Interpretable performance analysis towards offline reinforcement learning: A dataset perspective
- Chenyang Xi, Bo Tang, Jiajun Shen, Xinfu Liu, Feiyu Xiong, and Xueying Li. arXiv, 2021.
- pH-RL: A personalization architecture to bring reinforcement learning to health practice
- Ali el Hassouni, Mark Hoogendoorn, Marketa Ciharova, Annet Kleiboer, Khadicha Amarti, Vesa Muhonen, Heleen Riper, and A. E. Eiben. arXiv, 2021.
- DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning [podcast]
- Xianyuan Zhan, Haoran Xu, Yue Zhang, Yusen Huo, Xiangyu Zhu, Honglei Yin, and Yu Zheng. arXiv, 2021.
- Personalization for Web-based Services using Offline Reinforcement Learning
- Pavlos Athanasios Apostolopoulos, Zehui Wang, Hanson Wang, Chad Zhou, Kittipat Virochsiri, Norm Zhou, and Igor L. Markov. arXiv, 2021.
- NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning [website] [code]
- Rongjun Qin, Songyi Gao, Xingyuan Zhang, Zhen Xu, Shengkai Huang, Zewen Li, Weinan Zhang, and Yang Yu. arXiv, 2021.
- A General Offline Reinforcement Learning Framework for Interactive Recommendation
- Teng Xiao and Donglin Wang. AAAI, 2021.
- Value Function is All You Need: A Unified Learning Framework for Ride Hailing Platforms
- Xiaocheng Tang, Fan Zhang, Zhiwei (Tony)Qin, Yansheng Wang, Dingyuan Shi, Bingchen Song, Yongxin Tong, Hongtu Zhu, and Jieping Ye. KDD, 2021.
- Discovering an Aid Policy to Minimize Student Evasion Using Offline Reinforcement Learning
- Leandro M. de Lima and Renato A. Krohling. IJCNN, 2021.
- Learning robust driving policies without online exploration
- Daniel Graves, Nhat M. Nguyen, Kimia Hassanzadeh, Jun Jin, and Jun Luo. ICRA, 2021.
- Offline Meta-level Model-based Reinforcement Learning Approach for Cold-Start Recommendation
- Yanan Wang, Yong Ge, Li Li, Rui Chen, and Tong Xu. arXiv, 2020.
- Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation
- Diksha Garg, Priyanka Gupta, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff. arXiv, 2020.
- An Empirical Study of Representation Learning for Reinforcement Learning in Healthcare
- Taylor W. Killian, Haoran Zhang, Jayakumar Subramanian, Mehdi Fatemi, and Marzyeh Ghassemi. arXiv, 2020.
- Learning from Human Feedback: Challenges for Real-World Reinforcement Learning in NLP
- Julia Kreutzer, Stefan Riezler, and Carolin Lawrence. arXiv, 2020.
- Remote Electrical Tilt Optimization via Safe Reinforcement Learning
- Filippo Vannella, Grigorios Iakovidis, Ezeddin Al Hakim, Erik Aumayr, and Saman Feghhi. arXiv, 2020.
- Offline Reinforcement Learning Hands-On
- Louis Monier, Jakub Kmec, Alexandre Laterre, Thomas Pierrot, Valentin Courgeau, Olivier Sigaud, Karim Beguir. arXiv, 2020.
- D4RL: Datasets for Deep Data-Driven Reinforcement Learning [website] [blog] [code]
- Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. arXiv, 2020.
- RL Unplugged: Benchmarks for Offline Reinforcement Learning [code] [dataset]
- Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, and Nando de Freitas. NeurIPS, 2020.
- An Optimistic Perspective on Offline Reinforcement Learning [website] [blog]
- Rishabh Agarwal, Dale Schuurmans, and Mohammad Norouzi. ICML, 2020.
- Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning
- Amin Rakhsha, Goran Radanovic, Rati Devidze, Xiaojin Zhu, and Adish Singla. ICML, 2020.
- Off-policy Learning in Two-stage Recommender Systems
- Jiaqi Ma, Zhe Zhao, Xinyang Yi, Ji Yang, Minmin Chen, Jiaxi Tang, Lichan Hong, and Ed H Chi. WWW, 2020.
- Human-centric Dialog Training via Offline Reinforcement Learning
- Natasha Jaques, Judy Hanwen Shen, Asma Ghandeharioun, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Shane Gu, and Rosalind Picard. EMNLP, 2020.
- Definition and evaluation of model-free coordination of electrical vehicle charging with reinforcement learning
- Nasrin Sadeghianpourhamami, Johannes Deleu, and Chris Develder. IEEE T SMART GRID, 2020.
- Optimal Tap Setting of Voltage Regulation Transformers Using Batch Reinforcement Learning
- Hanchen Xu, Alejandro D. Domínguez-García, and Peter W. Sauer. IEEE T POWER SYSTEMS, 2020.
- Benchmarking Batch Deep Reinforcement Learning Algorithms
- Scott Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, and Joelle Pineau. arXiv, 2019.
- Top-K Off-Policy Correction for a REINFORCE Recommender System
- Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed Chi. WSDM, 2019.
- A Clustering-Based Reinforcement Learning Approach for Tailored Personalization of E-Health Interventions
- Ali el Hassouni, Mark Hoogendoorn, Martijn van Otterlo, A. E. Eiben, Vesa Muhonen, and Eduardo Barbaro. arXiv, 2018.
- Generating Interpretable Fuzzy Controllers using Particle Swarm Optimization and Genetic Programming
- Daniel Hein, Steffen Udluft, and Thomas A. Runkler. GECCO, 2018.
- End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient
- Li Zhou, Kevin Small, Oleg Rokhlenko, and Charles Elkan. arXiv, 2017.
- Batch Reinforcement Learning on the Industrial Benchmark: First Experiences
- Daniel Hein, Steffen Udluft, Michel Tokic, Alexander Hentschel, Thomas A. Runkler, and Volkmar Sterzing. IJCNN, 2017.
- Policy Networks with Two-Stage Training for Dialogue Systems
- Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, and Kaheer Suleman. SIGDial, 2016.
- Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning
- Arthur Guez, Robert D. Vincent, Massimo Avoli, and Joelle Pineau. IAAI, 2008.
- Optimal Off-Policy Evaluation from Multiple Logging Policies [code]
- Nathan Kallus, Yuta Saito, and Masatoshi Uehara. ICML, 2021.
- Non-Stationary Off-Policy Optimization
- Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, and Amr Ahmed. AISTATS, 2021.
- Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting [video]
- Ilja Kuzborskij, Claire Vernade, András György, and Csaba Szepesvári. AISTATS, 2021.
- High-Confidence Off-Policy (or Counterfactual) Variance Estimation
- Yash Chandak, Shiv Shankar, and Philip S. Thomas. AAAI, 2021.
- Learning from eXtreme Bandit Feedback
- Romain Lopez, Inderjit Dhillon, and Michael I. Jordan. AAAI, 2021.
- Conservative Policy Construction Using Variational Autoencoders for Logged Data with Missing Values
- Mahed Abroshan, Kai Hou Yip, Cem Tekin, and Mihaela van der Schaar. arXiv, 2021.
- Control Variates for Slate Off-Policy Evaluation
- Nikos Vlassis, Ashok Chandrashekar, Fernando Amat Gil, and Nathan Kallus. arXiv, 2021.
- Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits
- Ruohan Zhan, Vitor Hadad, David A. Hirshberg, and Susan Athey. arXiv, 2021.
- Off-Policy Risk Assessment in Contextual Bandits
- Audrey Huang, Liu Leqi, Zachary C. Lipton, and Kamyar Azizzadenesheli. arXiv, 2021.
- Off-Policy Evaluation of Slate Policies under Bayes Risk
- Nikos Vlassis, Fernando Amat Gil, and Ashok Chandrashekar. arXiv, 2021.
- Doubly Robust Off-Policy Learning on Low-Dimensional Manifolds by Deep Neural Networks
- Minshuo Chen, Hao Liu, Wenjing Liao, and Tuo Zhao. arXiv, 2020.
- Bandit Overfitting in Offline Policy Learning
- David Brandfonbrener, William F. Whitney, Rajesh Ranganath, and Joan Bruna. arXiv, 2020.
- Counterfactual Learning of Continuous Stochastic Policies
- Houssam Zenati, Alberto Bietti, Matthieu Martin, Eustache Diemert, and Julien Mairal. arXiv, 2020.
- A Practical Guide of Off-Policy Evaluation for Bandit Problems
- Masahiro Kato, Kenshi Abe, Kaito Ariu, and Shota Yasui. arXiv, 2020.
- Off-Policy Evaluation and Learning for External Validity under a Covariate Shift
- Masatoshi Uehara, Masahiro Kato, and Shota Yasui. NeurIPS, 2020.
- Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions
- James McInerney, Brian Brost, Praveen Chandar, Rishabh Mehrotra, and Ben Carterette. KDD, 2020.
- Off-policy Bandits with Deficient Support
- Noveen Sachdeva, Yi Su, and Thorsten Joachims. KDD, 2020.
- Doubly robust off-policy evaluation with shrinkage
- Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, and Miroslav Dudik. ICML, 2020.
- Adaptive Estimator Selection for Off-Policy Evaluation [video]
- Yi Su, Pavithra Srinath, and Akshay Krishnamurthy. ICML, 2020.
- Off-policy Bandit and Reinforcement Learning
- Yusuke Narita, Shota Yasui, and Kohei Yata. arXiv, 2020.
- Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits
- Nian Si, Fan Zhang, Zhengyuan Zhou, and Jose Blanchet. ICML, 2020.
- Efficient Policy Learning from Surrogate-Loss Classification Reductions [code]
- Andrew Bennett and Nathan Kallus. ICML, 2020.
- More Efficient Policy Learning via Optimal Retargeting
- Nathan Kallus. JASA, 2020.
- Semi-Parametric Efficient Policy Learning with Continuous Actions
- Victor Chernozhukov, Mert Demirer, Greg Lewis, and Vasilis Syrgkanis. NeurIPS, 2019.
- Balanced Off-Policy Evaluation in General Action Spaces
- Arjun Sondhi, David Arbour, and Drew Dimmery. AISTATS, 2019.
- Policy Evaluation with Latent Confounders via Optimal Balance
- Andrew Bennett and Nathan Kallus. NeuIPS, 2019.
- Focused Context Balancing for Robust Offline Policy Evaluation
- Hao Zou, Kun Kuang, Boqi Chen, Peixuan Chen, and Peng Cui. KDD, 2019.
- On the Design of Estimators for Bandit Off-Policy Evaluation
- Nikos Vlassis, Aurelien Bibaut, Maria Dimakopoulou, and Tony Jebara. ICML, 2019.
- CAB: Continuous Adaptive Blending for Policy Evaluation and Learning
- Yi Su, Lequn Wang, Michele Santacatterina, and Thorsten Joachims. ICML, 2019.
- Efficient Counterfactual Learning from Bandit Feedback
- Yusuke Narita, Shota Yasui, and Kohei Yata. AAAI, 2019.
- Policy Evaluation and Optimization with Continuous Treatments
- Nathan Kallus and Angela Zhou. AISTATS, 2019.
- Offline Evaluation of Ranking Policies with Click Models
- Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, and Zheng Wen. KDD, 2018.
- Effective Evaluation using Logged Bandit Feedback from Multiple Loggers
- Aman Agarwal, Soumya Basu, Tobias Schnabel, and Thorsten Joachims. KDD, 2018.
- Deep Learning with Logged Bandit Feedback
- Thorsten Joachims, Adith Swaminathan, and Maarten de Rijke. ICLR, 2018.
- Off-policy Evaluation for Slate Recommendation
- Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudík, John Langford, Damien Jose, and Imed Zitouni. NeurIPS, 2017.
- Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
- Yu-Xiang Wang, Alekh Agarwal, and Miroslav Dudik. ICML, 2017.
- Data-Efficient Policy Evaluation Through Behavior Policy Search
- Josiah P. Hanna, Philip S. Thomas, Peter Stone, and Scott Niekum. ICML, 2017.
- The Self-Normalized Estimator for Counterfactual Learning
- Adith Swaminathan and Thorsten Joachims. NeurIPS, 2015.
- Doubly Robust Policy Evaluation and Optimization
- Miroslav Dudík, Dumitru Erhan, John Langford, and Lihong Li. ICML, 2011.
- Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
- Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. WSDM, 2011.
- State Relevance for Off-Policy Evaluation
- Simon P. Shen, Yecheng Jason Ma, Omer Gottesman, and Finale Doshi-Velez. ICML, 2021.
- Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
- Botao Hao, Xiang Ji, Yaqi Duan, Hao Lu, Csaba Szepesvari, and Mengdi Wang. ICML, 2021.
- Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization
- Michael R. Zhang, Tom Le Paine, Ofir Nachum, Cosmin Paduraru, George Tucker, Ziyu Wang, Mohammad Norouzi. ICLR, 2021.
- Minimax Model Learning
- Cameron Voloshin, Nan Jiang, and Yisong Yue. AISTATS, 2021.
- Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders
- Andrew Bennett, Nathan Kallus, Lihong Li, and Ali Mousavi. AISTATS, 2021.
- A Spectral Approach to Off-Policy Evaluation for POMDPs
- Yash Nair and Nan Jiang. arXiv, 2021.
- Projected State-action Balancing Weights for Offline Reinforcement Learning
- Jiayi Wang, Zhengling Qi, and Raymond K.W. Wong. arXiv, 2021.
- Supervised Off-Policy Ranking
- Yue Jin, Yue Zhang, Tao Qin, Xudong Zhang, Jian Yuan, Houqiang Li, and Tie-Yan Liu. arXiv, 2021.
- Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation
- Yunhao Tang, Tadashi Kozuno, Mark Rowland, Rémi Munos, and Michal Valko. arXiv, 2021.
- Variance-Aware Off-Policy Evaluation with Linear Function Approximation
- Yifei Min, Tianhao Wang, Dongruo Zhou, and Quanquan Gu. arXiv, 2021.
- Active Offline Policy Selection
- Ksenia Konyushkova, Yutian Chen, Thomas Paine, Caglar Gulcehre, Cosmin Paduraru, Daniel J Mankowitz, Misha Denil, and Nando de Freitas. arXiv, 2021.
- On Instrumental Variable Regression for Deep Offline Policy Evaluation
- Yutian Chen, Liyuan Xu, Caglar Gulcehre, Tom Le Paine, Arthur Gretton, Nando de Freitas, and Arnaud Doucet. arXiv, 2021.
- Characterizing Uniform Convergence in Offline Policy Evaluation via model-based approach: Offline Learning, Task-Agnostic and Reward-Free
- Ming Yin and Yu-Xiang Wang. arXiv, 2021.
- Average-Reward Off-Policy Policy Evaluation with Function Approximation
- Shangtong Zhang, Yi Wan, Richard S. Sutton, and Shimon Whiteson. arXiv, 2021.
- Universal Off-Policy Evaluation
- Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas. arXiv, 2021.
- Sequential causal inference in a single world of connected units
- Aurelien Bibaut, Maya Petersen, Nikos Vlassis, Maria Dimakopoulou, and Mark van der Laan, arXiv, 2021.
- Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning
- Ming Yin, Yu Bai, and Yu-Xiang Wang. arXiv, 2020.
- Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies
- Jinlin Lai, Lixin Zou, and Jiaxing Song. arXiv, 2020.
- Kernel Methods for Policy Evaluation: Treatment Effects, Mediation Analysis, and Off-Policy Planning
- Rahul Singh, Liyuan Xu, and Arthur Gretton. arXiv, 2020.
- Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
- Hongseok Namkoong, Ramtin Keramati, Steve Yadlowsky, and Emma Brunskill. NeurIPS, 2020.
- CoinDICE: Off-Policy Confidence Interval Estimation
- Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvari, and Dale Schuurmans. NeurIPS, 2020.
- Off-Policy Interval Estimation with Lipschitz Value Iteration
- Ziyang Tang, Yihao Feng, Na Zhang, Jian Peng, and Qiang Liu. NeurIPS, 2020.
- Off-Policy Evaluation via the Regularized Lagrangian
- Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, and Dale Schuurmans. NeurIPS, 2020.
- Minimax Value Interval for Off-Policy Evaluation and Policy Optimization
- Nan Jiang and Jiawei Huang. NeurIPS, 2020.
- Statistical Bootstrapping for Uncertainty Estimation in Off-Policy Evaluation
- Ilya Kostrikov and Ofir Nachum. arXiv, 2020.
- Towards Off-policy Evaluation as a Prerequisite for Real-world Reinforcement Learning in Building Control [video]
- Bingqing Chen, Ming Jin, Zhe Wang, Tianzhen Hong, and Mario Bergés, RLEM, 2020.
- Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies
- Xinyun Chen, Lu Wang, Yizhe Hang, Heng Ge, and Hongyuan Zha. ICLR, 2020.
- Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
- Ziyang Tang, Yihao Feng, Lihong Li, Dengyong Zhou, and Qiang Liu. ICLR, 2020.
- Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning
- Ali Mousavi, Lihong Li, Qiang Liu, and Denny Zhou. ICLR, 2020.
- Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
- Yaqi Duan, Zeyu Jia, and Mengdi Wang. ICML, 2020.
- Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
- Omer Gottesman, Joseph Futoma, Yao Liu, Sonali Parbhoo, Leo Celi, Emma Brunskill, and Finale Doshi-Velez. ICML, 2020.
- Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
- Nathan Kallus and Masatoshi Uehara. ICML, 2020.
- Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
- Yao Liu, Pierre-Luc Bacon, and Emma Brunskill. ICML, 2020.
- Minimax Weight and Q-Function Learning for Off-Policy Evaluation
- Masatoshi Uehara, Jiawei Huang, and Nan Jiang. ICML, 2020.
- Accountable Off-Policy Evaluation With Kernel Bellman Statistics
- Yihao Feng, Tongzheng Ren, Ziyang Tang, and Qiang Liu. ICML, 2020.
- Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
- Ming Yin and Yu-Xiang Wang. ICML, 2020.
- Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning
- Nathan Kallus and Masatoshi Uehara. arXiv, 2019.
- Off-Policy Evaluation in Partially Observable Environments
- Guy Tennenholtz, Uri Shalit, and Shie Mannor. AAAI, 2019.
- Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
- Nathan Kallus and Masatoshi Uehara. NeurIPS, 2019.
- Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
- Tengyang Xie, Yifei Ma, and Yu-Xiang Wang. NeuIPS, 2019.
- Off-Policy Evaluation via Off-Policy Classification
- Alexander Irpan, Kanishka Rao, Konstantinos Bousmalis, Chris Harris, Julian Ibarz, and Sergey Levine. NeuIPS, 2019.
- Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy
- Yuan Xie, Boyi Liu, Qiang Liu, Zhaoran Wang, Yuan Zhou, and Jian Peng. ICLR, 2019.
- Batch Policy Learning under Constraints [code] [website]
- Hoang M. Le, Cameron Voloshin, and Yisong Yue. ICML, 2019.
- More Efficient Off-Policy Evaluation through Regularized Targeted Learning
- Aurelien Bibaut, Ivana Malenica, Nikos Vlassis, and Mark Van Der Laan. ICML, 2019.
- Combining parametric and nonparametric models for off-policy evaluation
- Omer Gottesman, Yao Liu, Scott Sussex, Emma Brunskill, and Finale Doshi-Velez. ICML, 2019.
- Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models
- Michael Oberst and David Sontag. ICML, 2019.
- Importance Sampling Policy Evaluation with an Estimated Behavior Policy
- Josiah Hanna, Scott Niekum, and Peter Stone. ICML, 2019.
- When People Change their Mind: Off-Policy Evaluation in Non-Stationary Recommendation Environments
- Rolf Jagerman, Ilya Markov, and Maarten de Rijke. WSDM, 2019.
- Representation Balancing MDPs for Off-policy Policy Evaluation
- Yao Liu, Omer Gottesman, Aniruddh Raghu, Matthieu Komorowski, Aldo A. Faisal, Finale Doshi-Velez, and Emma Brunskill. NeuIPS, 2018.
- Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
- Qiang Liu, Lihong Li, Ziyang Tang, and Dengyong Zhou. NeuIPS, 2018.
- Confounding-Robust Policy Improvement
- Nathan Kallus and Angela Zhou. NeuIPS, 2018.
- Balanced Policy Evaluation and Learning
- Nathan Kallus. NeuIPS, 2018.
- More Robust Doubly Robust Off-policy Evaluation
- Mehrdad Farajtabar, Yinlam Chow, and Mohammad Ghavamzadeh. ICML, 2018.
- Importance Sampling for Fair Policy Selection
- Shayan Doroudi, Philip Thomas, and Emma Brunskill. UAI, 2017.
- Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing
- Philip S. Thomas, Georgios Theocharous, Mohammad Ghavamzadeh, Ishan Durugkar, and Emma Brunskill. AAAI, 2017.
- Consistent On-Line Off-Policy Evaluation
- Assaf Hallak and Shie Mannor. ICML, 2017.
- Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation
- Josiah P. Hanna, Peter Stone, and Scott Niekum. AAAMS, 2016.
- Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
- Nan Jiang and Lihong Li. ICML, 2016.
- Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
- Philip Thomas and Emma Brunskill. ICML, 2016.
- High Confidence Off-Policy Evaluation
- Philip S. Thomas, Georgios Theocharous, and Mohammad Ghavamzadeh. AAAI, 2015.
- Eligibility Traces for Off-Policy Policy Evaluation
- Doina Precup, Richard S. Sutton, and Satinder P. Singh. ICML, 2000.
- Evaluating the Robustness of Off-Policy Evaluation [package]
- Yuta Saito, Takuma Udagawa, Haruka Kiyohara, Kazuki Mogi, Yusuke Narita, and Kei Tateno. RecSys, 2021.
- Data-Driven Off-Policy Estimator Selection: An Application in User Marketing on An Online Content Delivery Service
- Yuta Saito, Takuma Udagawa, and Kei Tateno. arXiv, 2021.
- Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach
- Haoming Jiang, Bo Dai, Mengjiao Yang, Wei Wei, and Tuo Zhao. arXiv, 2021.
- Benchmarks for Deep Off-Policy Evaluation [code]
- Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, and Thomas Paine. ICLR, 2021.
- Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings
- Shengpu Tang and Jenna Wiens. MLHC, 2021.
- Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation [software] [public dataset]
- Yuta Saito, Shunsuke Aihara, Megumi Matsutani, and Yusuke Narita. arXiv, 2020.
- Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning [code]
- Cameron Voloshin, Hoang M. Le, Nan Jiang, and Yisong Yue, arXiv, 2019.
- Off-Policy Evaluation of Probabilistic Identity Data in Lookalike Modeling
- Randell Cotta, Dan Jiang, Mingyang Hu, and Peizhou Liao. WSDM, 2019.
- Offline Evaluation to Make Decisions About Playlist Recommendation
- Alois Gruson, Praveen Chandar, Christophe Charbuillet, James McInerney, Samantha Hansen, Damien Tardieu, and Ben Carterette. WSDM, 2019.
- Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters
- Aniruddh Raghu, Omer Gottesman, Yao Liu, Matthieu Komorowski, Aldo Faisal, Finale Doshi-Velez, and Emma Brunskill. arXiv, 2018.
- Evaluating Reinforcement Learning Algorithms in Observational Health Settings
- Omer Gottesman, Fredrik Johansson, Joshua Meier, Jack Dent, Donghun Lee, Srivatsan Srinivasan, Linying Zhang, Yi Ding, David Wihl, Xuefeng Peng, Jiayu Yao, Isaac Lage, Christopher Mosch, Li-wei H. Lehman, Matthieu Komorowski, Matthieu Komorowski, Aldo Faisal, Leo Anthony Celi, David Sontag, and Finale Doshi-Velez. arXiv, 2018.
- Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems
- Rishabh Mehrotra, James McInerney, Hugues Bouchard, Mounia Lalmas, and Fernando Diaz. CIKM, 2018.
- Offline A/B testing for Recommender Systems
- Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dollé. WSDM, 2018.
- Offline Comparative Evaluation with Incremental, Minimally-Invasive Online Feedback
- Ben Carterette and Praveen Chandar. SIGIR, 2018.
- Open Bandit Pipeline: a research framework for bandit algorithms and off-policy evaluation [paper] [documentation] [public dataset]
- Yuta Saito, Shunsuke Aihara, Megumi Matsutani, and Yusuke Narita.
- pyIEOE: Towards An Interpretable Evaluation for Offline Evaluation [paper]
- Yuta Saito, Takuma Udagawa, Haruka Kiyohara, Kazuki Mogi, Yusuke Narita, and Kei Tateno.
- d3rlpy: A data-driven deep reinforcement learning library as an out-of-the-box tool [website] [documentation]
- Takuma Seno.
- MINERVA: An out-of-the-box GUI tool for data-driven deep reinforcement learning [website] [documentation]
- Takuma Seno.
- Benchmarks for Deep Off-Policy Evaluation [paper]
- Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, and Thomas Paine.
- D4RL: Datasets for Deep Data-Driven Reinforcement Learning [paper] [website]
- Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine.
- RL Unplugged: Benchmarks for Offline Reinforcement Learning [paper] [dataset]
- Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, and Nando de Freitas.
- NeoRL: Near Real-World Benchmarks for Offline Reinforcement Learning [paper] [website]
- Rongjun Qin, Songyi Gao, Xingyuan Zhang, Zhen Xu, Shengkai Huang, Zewen Li, Weinan Zhang, and Yang Yu.
- RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising [paper]
- David Rohde, Stephen Bonner, Travis Dunlop, Flavian Vasile, and Alexandros Karatzoglou.
- MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces [paper] [documantation]
- Marlesson R. O. Santana, Luckeciano C. Melo, Fernando H. F. Camargo, Bruno Brandão, Anderson Soares, Renan M. Oliveira, and Sandor Caetano.
- Offline Reinforcement Learning: How Conservative Algorithms Can Enable New Applications
- Aviral Kumar and Avi Singh. BAIR Blog, 2020.
- AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
- Ashvin Nair and Abhishek Gupta. BAIR Blog, 2020.
- D4RL: Building Better Benchmarks for Offline Reinforcement Learning
- Justin Fu. BAIR Blog, 2020.
- Does On-Policy Data Collection Fix Errors in Off-Policy Reinforcement Learning?
- Aviral Kumar and Abhishek Gupta. BAIR Blog, 2020.
- Tackling Open Challenges in Offline Reinforcement Learning
- George Tucker and Sergey Levine. Google AI Blog, 2020.
- An Optimistic Perspective on Offline Reinforcement Learning
- Rishabh Agarwal and Mohammad Norouzi. Google AI Blog, 2020.
- Decisions from Data: How Offline Reinforcement Learning Will Change How We Use Machine Learning
- Sergey Levine. Medium, 2020.
- Introducing completely free datasets for data-driven deep reinforcement learning
- Takuma Seno. towards data science, 2020.
- Offline (Batch) Reinforcement Learning: A Review of Literature and Applications
- Daniel Seita. danieltakeshi.github.io, 2020.
- Data-Driven Deep Reinforcement Learning
- Aviral Kumar. BAIR Blog, 2019.
- Sergey Levine on Robot Learning & Offline RL
- Sergey Levine. The Gradient, 2021.
- Off-Line, Off-Policy RL for Real-World Decision Making at Facebook
- Jason Gauci. TWIML, 2021.
- Xianyuan Zhan | TalkRL: The Reinforcement Learning Podcast
- Xianyuan Zhan. TWIML, 2021.
- MOReL: Model-Based Offline Reinforcement Learning with Aravind Rajeswaran
- Aravind Rajeswaran. TWIML, 2020.
- Trends in Reinforcement Learning with Chelsea Finn
- Chelsea Finn. TWIML, 2020.
- Nan Jiang | TalkRL: The Reinforcement Learning Podcast
- Nan Jiang. TalkRL, 2020.
- Scott Fujimoto | TalkRL: The Reinforcement Learning Podcast
- Scott Fujimoto. TalkRL, 2019.
- Offline Reinforcement Learning (NeurIPS 2021)
- Reinforcement Learning for Real Life (ICML 2021)
- Reinforcement Learning Day 2021
- Offline Reinforcement Learning (NeurIPS 2020)
- Reinforcement Learning from Batch Data and Simulation
- Reinforcement Learning for Real Life (RL4RealLife 2020)
- Safety and Robustness in Decision Making (NeurIPS 2019)
- Reinforcement Learning for Real Life (ICML 2019)
- Real-world Sequential Decision Making (ICML 2019)
- Counterfactual Learning and Evaluation for Recommender Systems: Foundations, Implementations, and Recent Advances
- Yuta Saito and Thorstem Joachims. RecSys2021.
- Offline Reinforcement Learning
- Guy Tennenholtz. CHIL2021.
- Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
- Paria Rashidinejad. RL Theory Seminar2021.
- Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm
- Lin Chen. RL Theory Seminar2021.
- Is Pessimism Provably Efficient for Offline RL?
- Ying Jin. RL Theory Seminar2021.
- Adaptive Estimator Selection for Off-Policy Evaluation
- Yi Su. RL Theory Seminar2021.
- What are the Statistical Limits of Offline RL with Linear Function Approximation?
- Ruosong Wang. RL Theory Seminar2021.
- Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL
- Andrea Zanette. RL Theory Seminar2021.
- A Gentle Introduction to Offline Reinforcement Learning
- Sergey Levine. 2021.
- Principles for Tackling Distribution Shift: Pessimism, Adaptation, and Anticipation
- Chelsea Finn. 2020-2021 Machine Learning Advances and Applications Seminar.
- Offline Reinforcement Learning: Incorporating Knowledge from Data into RL
- Sergey Levine. IJCAI-PRICAI2020 Knowledge Based Reinforcement Learning Workshop.
- Offline RL
- Nando de Freitas. NeurIPS2020 OfflineRL Workshop.
- Learning a Multi-Agent Simulator from Offline Demonstrations
- Brandyn White. NeurIPS2020 OfflineRL Workshop.
- Towards Reliable Validation and Evaluation for Offline RL
- Nan Jiang. NeurIPS2020 OfflineRL Workshop.
- Batch RL Models Built for Validation
- Finale Doshi-Velez. NeurIPS2020 OfflineRL Workshop.
- Offline Reinforcement Learning: From Algorithms to Practical Challenges
- Aviral Kumar and Sergey Levine. NeurIPS2020.
- Data Scalability for Robot Learning
- Chelsea Finn. RI Seminar2020.
- Statistically Efficient Offline Reinforcement Learning
- Nathan Kallus. ARL Seminor2020.
- Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning
- Yu-Xiang Wang. RL Theory Seminar2020.
- Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
- Mengdi Wang. RL Theory Seminar2020.
- Beyond the Training Distribution: Embodiment, Adaptation, and Symmetry
- Chelsea Finn. EI Seminar2020.
- Combining Statistical methods with Human Input for Evaluation and Optimization in Batch Settings
- Finale Doshi-Velez. NeurIPS2019 Workshop on Safety and Robustness in Decision Making.
- Efficiently Breaking the Curse of Horizon with Double Reinforcement Learning
- Nathan Kallus. NeurIPS2019 Workshop on Safety and Robustness in Decision Making.
- Scaling Probabilistically Safe Learning to Robotics
- Scott Niekum. NeurIPS2019 Workshop on Safety and Robustness in Decision Making.
- Deep Reinforcement Learning in the Real World
- Sergey Levine. Workshop on New Directions in Reinforcement Learning and Control2019.