Human-level control through deep reinforcement learning |
Nature 2015 |
1. RL challenges: derivation of efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. 2. This paper: use deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. 3. RL instability causes: the correlations present in the sequence of observations; the fact that small updates to Q may significantly change the policy and therefore change the data distribution; the correlations between Q and target values. 4. RL instability solutions: experience replay that randomize over data and therefore removing correlations in the observation sequence and smoothing over changes in the data distribution; only periodically adjust Q towards target values to reduce correlations with the target |
Mastering the game of Go with deep neural networks and tree search |
Nature 2016 |
1. Challenges of Go: enormous search space, difficulty of evaluating board positions and moves. 2. Train SL policy network directly from human expert moves. 3. Train a fast policy that can rapidly sample actions during rollouts. 4. Train a RL policy network that improves SL policy network by optimizing final outcome. 5. Train a value network that predicts the winner of games played by the RL policy network against itself |
Asynchronous Methods for Deep Reinforcement Learning |
ICML 2016 |
1. Asynchronous gradient descent for optimization of deep neural network controllers. 2. The asynchronous method can be applied to both value-based and policy-based models, off-policy as well as on-policy, and in discrete as well as continuous domains. 3. Overall Asynchronous Advantage Actor-Critic (A3C) model has best performance |
Mastering the game of go without human knowledge |
Nature 2017 |
1. Introduce an algorithm based solely on reinforcement learning, without human data, guidance, or domain knowledge beyond game rules. 2. Its neural network is trained to predict AlphaGo’s own move selections and also the winner of AlphaGo’s games. |
Accelerating Multiagent Reinforcement Learning through Transfer Learning |
AAAI 2017 |
1. RL algorithms suffer from scalability issues, especially in a MAS. 2. Objective: accelerate learning in multiagent sequential decision making tasks by reusing previous knowledge, both from past solutions and advising between agents |
An Advising Framework for Multiagent Reinforcement Learning Systems |
AAAI 2017 |
1. Classical RL approaches for sequential decision making in MAS require a long time to learn. 2. Objective: propose an advising framework where multiple agents advise each other while learning in a shared environment, starting with no previous knowledge, and the advisor is not expected to necessarily act optimally. |
An Efficient Approach to Model-Based Hierarchical Reinforcement Learning |
AAAI 2017 |
1. Propose a model-based approach to hierarchical reinforcement learning that exploits shared knowledge and selective execution at different levels of abstraction. 2. The proposed framework adopts a new transmission dynamics learning algorithm that identifies the common action-feature combinations of the subtasks, and evaluates the subtask execution choices through simulation. |
Improving Deep Reinforcement Learning with Knowledge Transfer |
AAAI 2017 |
1. While DRL has achieved good results in single task learning, the multi-task case is still underrepresented in the available literature. 2. This research proposal aims at extending DRL to the multi-task case by leveraging the power of Transfer Learning (TL) to improve the training time and results. 3. The focus is to define a novel framework for scalable DRL agents that detects similarities between tasks and balances various TL techniques, such as parameter sharing, policy transfer and skill transfer |
Learning Options in Multi-objective Reinforcement Learning |
AAAI 2017 |
1. Propose a method to learn options in Multi-objective RL domains in order to accelerate learning and reuse knowledge across tasks. 2. The main idea is to learn options for each objective separately (PolicyBlocks can be used) and apply the learned options to the multi-objective problem |
Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay |
AAAI 2017 |
1. A new policy distillation architecture for knowledge sharing in deep reinforcement learning in multi-task domains. 2. A hierarchical prioritized experience replay in memory replay of DQN |
Transfer Reinforcement Learning with Shared Dynamics |
AAAI 2017 |
1. Focus on a particular Transfer RL problem: dynamics do not change from one task to another, and only the reward function does. 2. First idea: transition samples obtained from one task can be reused to learn on any other task: an immediate reward estimator is learnt in a supervised fashion. 3. Second idea: adopt optimism in the face of uncertainty to encourage exploration |
FeUdal Network for Hierarchical Reinforcement Learning |
ICML 2017 |
1. Propose a novel architecture for hierarchical reinforcement learning. 2. Employs a Manager module and a Worker module. 3. The Manager operates at a lower temporal resolution and sets abstract goals which are conveyed to and enacted by the Worker. 4. The Worker generates primitive actions at every tick of the environment. 5. This FuN facilitates very long timescale credit assignment and encourages the emergence of sub-policies associated with different goals set by the Manager |
Stochastic Neural Networks for Hierarchical Reinforcement Learning |
ICLR 2017 |
A general framework that first learns useful skills in a pre-training environment, and then leverages the acquired skills for learning faster in downstream tasks to tackle sparse rewards and long horizons. |