Introduces the task of counterfactual time series forecasting with textual conditions plus a text-attribution mechanism that improves accuracy by distinguishing mutable from immutable factors.
hub Mixed citations
Steering Your Diffusion Policy with Latent Space Reinforcement Learning
Mixed citation behavior. Most common role is background (67%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Optimizing a single constant initial noise vector for frozen generative robot policies improves success rates on 38 of 43 tasks by up to 58% relative improvement.
PlayWorld learns high-fidelity robot world models from unsupervised self-play, producing physically consistent video predictions that outperform models trained on human data and enabling 65% better real-world policy performance via model-based RL.
A2A flow matching starts action generation from prior proprioceptive actions in latent space to enable single-step high-quality predictions in robotic policies.
Replacing Gaussian noise with a temporally grounded prior from recent actions straightens flow-matching paths and improves success rates in robotic manipulation and prior-space RL.
UniSteer unifies human corrective actions and noise-space RL for VLA adaptation by inverting actions to noise targets, raising success rates from 20% to 90% in 66 minutes across four real-world manipulation tasks.
Diff-CAST replaces GAN discriminators with diffusion-based priors and adds symmetric command conditioning plus constrained RL to enable versatile, drift-free, and hardware-safe quadruped locomotion.
LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.
Q2RL extracts Q-functions from BC policies via minimal interactions and applies Q-gating to enable stable offline-to-online RL, outperforming baselines on manipulation benchmarks and achieving up to 100% success on-robot.
OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.
DeLock mitigates lock-in in low-data VLA post-training via visual grounding preservation and test-time contrastive prompt guidance, outperforming baselines across eight evaluations while matching data-heavy generalist policies.
ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.
Flow matching critics outperform monolithic ones in RL by 2x performance and 5x sample efficiency via test-time error recovery through integration and multi-point velocity supervision that preserves feature plasticity.
Steerable VLAs trained on rich synthetic commands at subtask, motion, and pixel levels enable VLMs to steer robot behavior more effectively, outperforming prior hierarchical baselines on real-world manipulation and generalization tasks.
MATT-Diff uses a diffusion model with vision transformer and attention to generate multimodal actions for active multi-target tracking from expert planner demonstrations.
A structured survey of dexterous robotic hand research that reviews hardware, control methods, data resources, and benchmarks while identifying major limitations and future directions.
citing papers explorer
-
What if Tomorrow is the World Cup Final? Counterfactual Time Series Forecasting with Textual Conditions
Introduces the task of counterfactual time series forecasting with textual conditions plus a text-attribution mechanism that improves accuracy by distinguishing mutable from immutable factors.
-
You've Got a Golden Ticket: Improving Generative Robot Policies With A Single Noise Vector
Optimizing a single constant initial noise vector for frozen generative robot policies improves success rates on 38 of 43 tasks by up to 58% relative improvement.
-
PlayWorld: Learning Robot World Models from Autonomous Play
PlayWorld learns high-fidelity robot world models from unsupervised self-play, producing physically consistent video predictions that outperform models trained on human data and enabling 65% better real-world policy performance via model-based RL.
-
Action-to-Action Flow Matching
A2A flow matching starts action generation from prior proprioceptive actions in latent space to enable single-step high-quality predictions in robotic policies.
-
WarmPrior: Straightening Flow-Matching Policies with Temporal Priors
Replacing Gaussian noise with a temporally grounded prior from recent actions straightens flow-matching paths and improves success rates in robotic manipulation and prior-space RL.
-
Unified Noise Steering for Efficient Human-Guided VLA Adaptation
UniSteer unifies human corrective actions and noise-space RL for VLA adaptation by inverting actions to noise targets, raising success rates from 20% to 90% in 66 minutes across four real-world manipulation tasks.
-
Constraint-Aware Diffusion Priors for High-Fidelity and Versatile Quadruped Locomotion
Diff-CAST replaces GAN discriminators with diffusion-based priors and adds symmetric command conditioning plus constrained RL to enable versatile, drift-free, and hardware-safe quadruped locomotion.
-
Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities
LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.
-
When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning
Q2RL extracts Q-functions from BC policies via minimal interactions and applies Q-gating to enable stable offline-to-online RL, outperforming baselines on manipulation benchmarks and achieving up to 100% success on-robot.
-
OGPO: Sample Efficient Full-Finetuning of Generative Control Policies
OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.
-
Breaking Lock-In: Preserving Steerability under Low-Data VLA Post-Training
DeLock mitigates lock-in in low-data VLA post-training via visual grounding preservation and test-time contrastive prompt guidance, outperforming baselines across eight evaluations while matching data-heavy generalist policies.
-
ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors
ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.
-
What Does Flow Matching Bring To TD Learning?
Flow matching critics outperform monolithic ones in RL by 2x performance and 5x sample efficiency via test-time error recovery through integration and multi-point velocity supervision that preserves feature plasticity.
-
Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control
Steerable VLAs trained on rich synthetic commands at subtask, motion, and pixel levels enable VLMs to steer robot behavior more effectively, outperforming prior hierarchical baselines on real-world manipulation and generalization tasks.
-
MATT-Diff: Multimodal Active Target Tracking by Diffusion Policy
MATT-Diff uses a diffusion model with vision transformer and attention to generate multimodal actions for active multi-target tracking from expert planner demonstrations.
-
Towards Robotic Dexterous Hand Intelligence: A Survey
A structured survey of dexterous robotic hand research that reviews hardware, control methods, data resources, and benchmarks while identifying major limitations and future directions.
- Learning While Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies