UPS framework uses conformal prediction to calibrate VLM verifiers for choosing between high-confidence action execution, natural language task queries, or policy interventions, then applies residual learning from interventions to continually improve the base policy with minimal feedback.
A reduction of imitation learning and structured prediction to no-regret online learning
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5representative citing papers
A framework using 3D Gaussian Splatting for visual domain randomization enables robust monocular RGB-based dexterous in-hand reorientation on real hardware for multiple objects under varied lighting.
ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.
A modular system uses motion matching to compose long-horizon human skill chains, trains RL experts, and distills them into a depth-based policy that lets a Unitree G1 humanoid autonomously climb, vault, and roll over obstacles up to 1.25 m tall.
citing papers explorer
-
When to Act, Ask, or Learn: Uncertainty-Aware Policy Steering
UPS framework uses conformal prediction to calibrate VLM verifiers for choosing between high-confidence action execution, natural language task queries, or policy interventions, then applies residual learning from interventions to continually improve the base policy with minimal feedback.
-
ViserDex: Visual Sim-to-Real for Robust Dexterous In-hand Reorientation
A framework using 3D Gaussian Splatting for visual domain randomization enables robust monocular RGB-based dexterous in-hand reorientation on real hardware for multiple objects under varied lighting.
-
ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors
ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.
-
Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching
A modular system uses motion matching to compose long-horizon human skill chains, trains RL experts, and distills them into a depth-based policy that lets a Unitree G1 humanoid autonomously climb, vault, and roll over obstacles up to 1.25 m tall.
- Vision-Language-Action Jump-Starting for Reinforcement Learning Robotic Agents