Adaptive Q-Chunking selects optimal action chunk sizes at each state via normalized advantage comparisons to outperform fixed chunk sizes in offline-to-online RL on robot benchmarks.
Efficient online reinforcement learning with offline data
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7roles
background 1polarities
background 1representative citing papers
FASTER models multi-candidate denoising as an MDP and trains a value function to filter actions early, delivering the performance of full sampling at lower cost in diffusion RL policies.
Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.
FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.
SimDist pretrains world models in simulation and adapts them to real-world robots by updating only the latent dynamics model, enabling rapid improvement on contact-rich tasks where prior methods fail.
Flow matching critics outperform monolithic ones in RL by 2x performance and 5x sample efficiency via test-time error recovery through integration and multi-point velocity supervision that preserves feature plasticity.
SERNF achieves sample-efficient real-world fine-tuning of multimodal dexterous policies by pairing exact-likelihood normalizing flow policies with action-chunked value critics.
citing papers explorer
-
Adaptive Q-Chunking for Offline-to-Online Reinforcement Learning
Adaptive Q-Chunking selects optimal action chunk sizes at each state via normalized advantage comparisons to outperform fixed chunk sizes in offline-to-online RL on robot benchmarks.
-
FASTER: Value-Guided Sampling for Fast RL
FASTER models multi-candidate denoising as an MDP and trains a value function to filter actions early, delivering the performance of full sampling at lower cost in diffusion RL policies.
-
Fisher Decorator: Refining Flow Policy via a Local Transport Map
Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.
-
FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control
FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.
-
Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation
SimDist pretrains world models in simulation and adapts them to real-world robots by updating only the latent dynamics model, enabling rapid improvement on contact-rich tasks where prior methods fail.
-
What Does Flow Matching Bring To TD Learning?
Flow matching critics outperform monolithic ones in RL by 2x performance and 5x sample efficiency via test-time error recovery through integration and multi-point velocity supervision that preserves feature plasticity.
-
SERNF: Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows
SERNF achieves sample-efficient real-world fine-tuning of multimodal dexterous policies by pairing exact-likelihood normalizing flow policies with action-chunked value critics.