Continual learning robots form a significantly more stable invariant subnetwork than constant-task controls, and preserving it improves adaptation while damaging it hurts performance.
Learning Shared Representations in Multi-task Reinforcement Learning
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
We investigate a paradigm in multi-task reinforcement learning (MT-RL) in which an agent is placed in an environment and needs to learn to perform a series of tasks, within this space. Since the environment does not change, there is potentially a lot of common ground amongst tasks and learning to solve them individually seems extremely wasteful. In this paper, we explicitly model and learn this shared structure as it arises in the state-action value space. We will show how one can jointly learn optimal value-functions by modifying the popular Value-Iteration and Policy-Iteration procedures to accommodate this shared representation assumption and leverage the power of multi-task supervised learning. Finally, we demonstrate that the proposed model and training procedures, are able to infer good value functions, even under low samples regimes. In addition to data efficiency, we will show in our analysis, that learning abstractions of the state space jointly across tasks leads to more robust, transferable representations with the potential for better generalization. this shared representation assumption and leverage the power of multi-task supervised learning. Finally, we demonstrate that the proposed model and training procedures, are able to infer good value functions, even under low samples regimes. In addition to data efficiency, we will show in our analysis, that learning abstractions of the state space jointly across tasks leads to more robust, transferable representations with the potential for better generalization.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Multitask offline fitted Q-iteration achieves 1/sqrt(nT) generalization rates under shared low-rank structure and reduces complexity for new tasks by reusing the upstream representation.
CMTRL recovers a shared low-rank feature matrix for T constrained linear bandit tasks in d dimensions using Safe-AltGDmin and provides regret and sample complexity bounds.
citing papers explorer
-
Evidence of an Emergent "Self" in Continual Robot Learning
Continual learning robots form a significantly more stable invariant subnetwork than constant-task controls, and preserving it improves adaptation while damaging it hurts performance.
-
Generalisation in Multitask Fitted Q-Iteration and Offline Q-learning
Multitask offline fitted Q-iteration achieves 1/sqrt(nT) generalization rates under shared low-rank structure and reduces complexity for new tasks by reusing the upstream representation.
-
Multi-Task Representation Learning for Conservative Linear Bandits
CMTRL recovers a shared low-rank feature matrix for T constrained linear bandit tasks in d dimensions using Safe-AltGDmin and provides regret and sample complexity bounds.