EXPO stabilizes online RL for expressive policies by training a base policy with imitation and using a lightweight Gaussian edit policy to select higher-value actions on the fly for sampling and TD backups.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.
RoHIL adapts human-in-the-loop RL policies to new illumination conditions offline by combining world-model image relighting, illumination-retention replay, and anchored Bellman regularisation, improving shifted-light performance while preserving source performance on four real-robot tasks.
DPPO fine-tunes diffusion policies via policy gradients and outperforms prior RL approaches for diffusion policies and PG-tuned alternatives on robot benchmarks while enabling stable training and hardware deployment.
HDFlow pairs a high-level diffusion planner for strategic subgoals with a low-level rectified flow planner for efficient trajectories, claiming superior performance on furniture assembly and other long-horizon robotic benchmarks.
citing papers explorer
-
EXPO: Stable Reinforcement Learning with Expressive Policies
EXPO stabilizes online RL for expressive policies by training a base policy with imitation and using a lightweight Gaussian edit policy to select higher-value actions on the fly for sampling and TD backups.
-
Steering Your Diffusion Policy with Latent Space Reinforcement Learning
DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.
-
RoHIL: Robust Human-in-the-Loop Robotic Reinforcement Learning Against Illumination Variations
RoHIL adapts human-in-the-loop RL policies to new illumination conditions offline by combining world-model image relighting, illumination-retention replay, and anchored Bellman regularisation, improving shifted-light performance while preserving source performance on four real-robot tasks.
-
Diffusion Policy Policy Optimization
DPPO fine-tunes diffusion policies via policy gradients and outperforms prior RL approaches for diffusion policies and PG-tuned alternatives on robot benchmarks while enabling stable training and hardware deployment.
-
HDFlow: Hierarchical Diffusion-Flow Planning for Long-horizon Tasks
HDFlow pairs a high-level diffusion planner for strategic subgoals with a low-level rectified flow planner for efficient trajectories, claiming superior performance on furniture assembly and other long-horizon robotic benchmarks.