pith. machine review for the scientific record. sign in

arxiv: 1805.12114 · v2 · submitted 2018-05-30 · 💻 cs.LG · cs.AI· cs.RO· stat.ML

Recognition: unknown

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

Authors on Pith no claims yet
Pith Number pith:SDGOT44F state: computed view record JSON
0 claims · 0 references · 0 theorem links. This is the computed registry record for this paper; it is not author-attested yet.
classification 💻 cs.LG cs.AIcs.ROstat.ML
keywords algorithmsdeepdynamicsmodel-freemodelsasymptoticfewerlearning
0
0 comments X
read the original abstract

Model-based reinforcement learning (RL) algorithms can attain excellent sample efficiency, but often lag behind the best model-free algorithms in terms of asymptotic performance. This is especially true with high-capacity parametric function approximators, such as deep networks. In this paper, we study how to bridge this gap, by employing uncertainty-aware dynamics models. We propose a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation. Our comparison to state-of-the-art model-based and model-free deep RL algorithms shows that our approach matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples (e.g., 8 and 125 times fewer samples than Soft Actor Critic and Proximal Policy Optimization respectively on the half-cheetah task).

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning

    cs.RO 2024-11 unverdicted novelty 6.0

    DINO-WM builds world models on pre-trained DINOv2 features to enable zero-shot planning from offline data without rewards or demonstrations.