A reduction of imitation learning and structured prediction to no-regret online learning

St ´ephane Ross, Geoffrey Gordon, Drew Bagnell · 2011

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

representative citing papers

When to Act, Ask, or Learn: Uncertainty-Aware Policy Steering

cs.RO · 2026-02-25 · unverdicted · novelty 7.0

UPS framework uses conformal prediction to calibrate VLM verifiers for choosing between high-confidence action execution, natural language task queries, or policy interventions, then applies residual learning from interventions to continually improve the base policy with minimal feedback.

ViserDex: Visual Sim-to-Real for Robust Dexterous In-hand Reorientation

cs.RO · 2026-04-13 · unverdicted · novelty 6.0

A framework using 3D Gaussian Splatting for visual domain randomization enables robust monocular RGB-based dexterous in-hand reorientation on real hardware for multiple objects under varied lighting.

ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

cs.RO · 2026-03-16 · conditional · novelty 6.0

ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.

Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

cs.RO · 2026-02-17 · unverdicted · novelty 6.0

A modular system uses motion matching to compose long-horizon human skill chains, trains RL experts, and distills them into a depth-based policy that lets a Unitree G1 humanoid autonomously climb, vault, and roll over obstacles up to 1.25 m tall.

Vision-Language-Action Jump-Starting for Reinforcement Learning Robotic Agents

cs.LG · 2026-04-15

citing papers explorer

Showing 5 of 5 citing papers.

When to Act, Ask, or Learn: Uncertainty-Aware Policy Steering cs.RO · 2026-02-25 · unverdicted · none · ref 23
UPS framework uses conformal prediction to calibrate VLM verifiers for choosing between high-confidence action execution, natural language task queries, or policy interventions, then applies residual learning from interventions to continually improve the base policy with minimal feedback.
ViserDex: Visual Sim-to-Real for Robust Dexterous In-hand Reorientation cs.RO · 2026-04-13 · unverdicted · none · ref 24
A framework using 3D Gaussian Splatting for visual domain randomization enables robust monocular RGB-based dexterous in-hand reorientation on real hardware for multiple objects under varied lighting.
ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors cs.RO · 2026-03-16 · conditional · none · ref 28
ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.
Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching cs.RO · 2026-02-17 · unverdicted · none · ref 32
A modular system uses motion matching to compose long-horizon human skill chains, trains RL experts, and distills them into a depth-based policy that lets a Unitree G1 humanoid autonomously climb, vault, and roll over obstacles up to 1.25 m tall.
Vision-Language-Action Jump-Starting for Reinforcement Learning Robotic Agents cs.LG · 2026-04-15 · unreviewed · ref 36

A reduction of imitation learning and structured prediction to no-regret online learning

fields

years

verdicts

representative citing papers

citing papers explorer