Deep rein- forcement learning in a handful of trials using probabilistic dynamics models,

· 2018

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Optimistic Policy Learning under Pessimistic Adversaries with Regret and Violation Guarantees

cs.LG · 2026-04-15 · unverdicted · novelty 8.0

RHC-UCRL is the first algorithm for safety-constrained RL under explicit adversarial dynamics, providing sub-linear regret and constraint violation guarantees by maintaining optimism over both agent and adversary policies.

Advantage-Guided Diffusion for Model-Based Reinforcement Learning

cs.AI · 2026-04-10 · unverdicted · novelty 7.0

Advantage-guided diffusion (SAG and EAG) steers sampling in diffusion world models to higher-advantage trajectories, enabling policy improvement and better sample efficiency on MuJoCo tasks.

citing papers explorer

Showing 2 of 2 citing papers.

Optimistic Policy Learning under Pessimistic Adversaries with Regret and Violation Guarantees cs.LG · 2026-04-15 · unverdicted · none · ref 33
RHC-UCRL is the first algorithm for safety-constrained RL under explicit adversarial dynamics, providing sub-linear regret and constraint violation guarantees by maintaining optimism over both agent and adversary policies.
Advantage-Guided Diffusion for Model-Based Reinforcement Learning cs.AI · 2026-04-10 · unverdicted · none · ref 16
Advantage-guided diffusion (SAG and EAG) steers sampling in diffusion world models to higher-advantage trajectories, enabling policy improvement and better sample efficiency on MuJoCo tasks.

Deep rein- forcement learning in a handful of trials using probabilistic dynamics models,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer