hub Mixed citations

Steering Your Diffusion Policy with Latent Space Reinforcement Learning

Andrew Wagenmaker, Mitsuhiko Nakamoto, Yunchu Zhang, Seohong Park, Waleed Yagoub, Anusha Nagabandi · 2025 · arXiv 2506.15799

Mixed citation behavior. Most common role is background (67%).

17 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 17 citing papers arXiv PDF

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 baseline 1 method 1

citation-polarity summary

background 4 baseline 1 use method 1

representative citing papers

What if Tomorrow is the World Cup Final? Counterfactual Time Series Forecasting with Textual Conditions

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

Introduces the task of counterfactual time series forecasting with textual conditions plus a text-attribution mechanism that improves accuracy by distinguishing mutable from immutable factors.

You've Got a Golden Ticket: Improving Generative Robot Policies With A Single Noise Vector

cs.RO · 2026-03-16 · conditional · novelty 7.0

Optimizing a single constant initial noise vector for frozen generative robot policies improves success rates on 38 of 43 tasks by up to 58% relative improvement.

PlayWorld: Learning Robot World Models from Autonomous Play

cs.RO · 2026-03-09 · unverdicted · novelty 7.0

PlayWorld learns high-fidelity robot world models from unsupervised self-play, producing physically consistent video predictions that outperform models trained on human data and enabling 65% better real-world policy performance via model-based RL.

Action-to-Action Flow Matching

cs.RO · 2026-02-07 · unverdicted · novelty 7.0

A2A flow matching starts action generation from prior proprioceptive actions in latent space to enable single-step high-quality predictions in robotic policies.

WarmPrior: Straightening Flow-Matching Policies with Temporal Priors

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

Replacing Gaussian noise with a temporally grounded prior from recent actions straightens flow-matching paths and improves success rates in robotic manipulation and prior-space RL.

Unified Noise Steering for Efficient Human-Guided VLA Adaptation

cs.RO · 2026-05-11 · unverdicted · novelty 6.0

UniSteer unifies human corrective actions and noise-space RL for VLA adaptation by inverting actions to noise targets, raising success rates from 20% to 90% in 66 minutes across four real-world manipulation tasks.

Constraint-Aware Diffusion Priors for High-Fidelity and Versatile Quadruped Locomotion

cs.RO · 2026-05-09 · unverdicted · novelty 6.0 · 2 refs

Diff-CAST replaces GAN discriminators with diffusion-based priors and adds symmetric command conditioning plus constrained RL to enable versatile, drift-free, and hardware-safe quadruped locomotion.

Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities

cs.AI · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.

When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning

cs.RO · 2026-05-06 · unverdicted · novelty 6.0

Q2RL extracts Q-functions from BC policies via minimal interactions and applies Q-gating to enable stable offline-to-online RL, outperforming baselines on manipulation benchmarks and achieving up to 100% success on-robot.

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.

Breaking Lock-In: Preserving Steerability under Low-Data VLA Post-Training

cs.RO · 2026-04-25 · unverdicted · novelty 6.0

DeLock mitigates lock-in in low-data VLA post-training via visual grounding preservation and test-time contrastive prompt guidance, outperforming baselines across eight evaluations while matching data-heavy generalist policies.

ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

cs.RO · 2026-03-16 · conditional · novelty 6.0

ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.

What Does Flow Matching Bring To TD Learning?

cs.LG · 2026-03-04 · conditional · novelty 6.0

Flow matching critics outperform monolithic ones in RL by 2x performance and 5x sample efficiency via test-time error recovery through integration and multi-point velocity supervision that preserves feature plasticity.

Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control

cs.RO · 2026-02-13 · unverdicted · novelty 6.0

Steerable VLAs trained on rich synthetic commands at subtask, motion, and pixel levels enable VLMs to steer robot behavior more effectively, outperforming prior hierarchical baselines on real-world manipulation and generalization tasks.

MATT-Diff: Multimodal Active Target Tracking by Diffusion Policy

cs.RO · 2025-11-14 · unverdicted · novelty 6.0

MATT-Diff uses a diffusion model with vision transformer and attention to generate multimodal actions for active multi-target tracking from expert planner demonstrations.

Towards Robotic Dexterous Hand Intelligence: A Survey

cs.RO · 2026-05-13 · unverdicted · novelty 4.0

A structured survey of dexterous robotic hand research that reviews hardware, control methods, data resources, and benchmarks while identifying major limitations and future directions.

Learning While Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies

cs.RO · 2026-05-01

citing papers explorer

Showing 17 of 17 citing papers.

What if Tomorrow is the World Cup Final? Counterfactual Time Series Forecasting with Textual Conditions cs.LG · 2026-05-14 · unverdicted · none · ref 52 · internal anchor
Introduces the task of counterfactual time series forecasting with textual conditions plus a text-attribution mechanism that improves accuracy by distinguishing mutable from immutable factors.
You've Got a Golden Ticket: Improving Generative Robot Policies With A Single Noise Vector cs.RO · 2026-03-16 · conditional · none · ref 39 · internal anchor
Optimizing a single constant initial noise vector for frozen generative robot policies improves success rates on 38 of 43 tasks by up to 58% relative improvement.
PlayWorld: Learning Robot World Models from Autonomous Play cs.RO · 2026-03-09 · unverdicted · none · ref 90 · internal anchor
PlayWorld learns high-fidelity robot world models from unsupervised self-play, producing physically consistent video predictions that outperform models trained on human data and enabling 65% better real-world policy performance via model-based RL.
Action-to-Action Flow Matching cs.RO · 2026-02-07 · unverdicted · none · ref 17 · internal anchor
A2A flow matching starts action generation from prior proprioceptive actions in latent space to enable single-step high-quality predictions in robotic policies.
WarmPrior: Straightening Flow-Matching Policies with Temporal Priors cs.LG · 2026-05-13 · unverdicted · none · ref 11 · internal anchor
Replacing Gaussian noise with a temporally grounded prior from recent actions straightens flow-matching paths and improves success rates in robotic manipulation and prior-space RL.
Unified Noise Steering for Efficient Human-Guided VLA Adaptation cs.RO · 2026-05-11 · unverdicted · none · ref 26 · internal anchor
UniSteer unifies human corrective actions and noise-space RL for VLA adaptation by inverting actions to noise targets, raising success rates from 20% to 90% in 66 minutes across four real-world manipulation tasks.
Constraint-Aware Diffusion Priors for High-Fidelity and Versatile Quadruped Locomotion cs.RO · 2026-05-09 · unverdicted · none · ref 31 · 2 links · internal anchor
Diff-CAST replaces GAN discriminators with diffusion-based priors and adds symmetric command conditioning plus constrained RL to enable versatile, drift-free, and hardware-safe quadruped locomotion.
Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities cs.AI · 2026-05-07 · unverdicted · none · ref 46 · 2 links · internal anchor
LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.
When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning cs.RO · 2026-05-06 · unverdicted · none · ref 56 · internal anchor
Q2RL extracts Q-functions from BC policies via minimal interactions and applies Q-gating to enable stable offline-to-online RL, outperforming baselines on manipulation benchmarks and achieving up to 100% success on-robot.
OGPO: Sample Efficient Full-Finetuning of Generative Control Policies cs.LG · 2026-05-04 · unverdicted · none · ref 181 · internal anchor
OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.
Breaking Lock-In: Preserving Steerability under Low-Data VLA Post-Training cs.RO · 2026-04-25 · unverdicted · none · ref 58 · internal anchor
DeLock mitigates lock-in in low-data VLA post-training via visual grounding preservation and test-time contrastive prompt guidance, outperforming baselines across eight evaluations while matching data-heavy generalist policies.
ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors cs.RO · 2026-03-16 · conditional · none · ref 26 · internal anchor
ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.
What Does Flow Matching Bring To TD Learning? cs.LG · 2026-03-04 · conditional · none · ref 59 · internal anchor
Flow matching critics outperform monolithic ones in RL by 2x performance and 5x sample efficiency via test-time error recovery through integration and multi-point velocity supervision that preserves feature plasticity.
Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control cs.RO · 2026-02-13 · unverdicted · none · ref 66 · internal anchor
Steerable VLAs trained on rich synthetic commands at subtask, motion, and pixel levels enable VLMs to steer robot behavior more effectively, outperforming prior hierarchical baselines on real-world manipulation and generalization tasks.
MATT-Diff: Multimodal Active Target Tracking by Diffusion Policy cs.RO · 2025-11-14 · unverdicted · none · ref 7 · internal anchor
MATT-Diff uses a diffusion model with vision transformer and attention to generate multimodal actions for active multi-target tracking from expert planner demonstrations.
Towards Robotic Dexterous Hand Intelligence: A Survey cs.RO · 2026-05-13 · unverdicted · none · ref 92 · internal anchor
A structured survey of dexterous robotic hand research that reviews hardware, control methods, data resources, and benchmarks while identifying major limitations and future directions.
Learning While Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies cs.RO · 2026-05-01 · unreviewed · ref 45 · internal anchor

Steering Your Diffusion Policy with Latent Space Reinforcement Learning

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer