pith. machine review for the scientific record. sign in

arxiv: 1906.02736 · v1 · submitted 2019-06-06 · 💻 cs.LG · stat.ML

Recognition: unknown

DeepMDP: Learning Continuous Latent Space Models for Representation Learning

Authors on Pith no claims yet
classification 💻 cs.LG stat.ML
keywords deepmdplatentlearningspacecontinuousenvironmenthigh-dimensionalmodel
0
0 comments X
read the original abstract

Many reinforcement learning (RL) tasks provide the agent with high-dimensional observations that can be simplified into low-dimensional continuous states. To formalize this process, we introduce the concept of a DeepMDP, a parameterized latent space model that is trained via the minimization of two tractable losses: prediction of rewards and prediction of the distribution over next latent states. We show that the optimization of these objectives guarantees (1) the quality of the latent space as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment. We connect these results to prior work in the bisimulation literature, and explore the use of a variety of metrics. Our theoretical findings are substantiated by the experimental result that a trained DeepMDP recovers the latent structure underlying high-dimensional observations on a synthetic environment. Finally, we show that learning a DeepMDP as an auxiliary task in the Atari 2600 domain leads to large performance improvements over model-free RL.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Dream to Control: Learning Behaviors by Latent Imagination

    cs.LG 2019-12 accept novelty 7.0

    Dreamer learns to control from images by imagining and optimizing behaviors in a learned latent world model, outperforming prior methods on 20 visual tasks in data efficiency and final performance.

  2. Multi-scale Predictive Representations for Goal-conditioned Reinforcement Learning

    cs.LG 2026-05 unverdicted novelty 6.0

    Ms.PR applies multi-scale predictive supervision to enforce goal-directed alignment in latent spaces for offline GCRL, yielding improved representation quality and performance on vision and state-based tasks.

  3. R3M: A Universal Visual Representation for Robot Manipulation

    cs.RO 2022-03 unverdicted novelty 6.0

    A visual encoder pre-trained on diverse human videos with contrastive and language objectives improves simulated robot manipulation success by over 20% versus training from scratch and enables real Franka arm tasks fr...