pith. sign in

arxiv: 2606.13675 · v1 · pith:4YQNXPDXnew · submitted 2026-06-11 · 💻 cs.RO

Improving Robotic Generalist Policies via Flow Reversal Steering

Pith reviewed 2026-06-27 06:18 UTC · model grok-4.3

classification 💻 cs.RO
keywords flow reversal steeringrobotic generalist policiesflow matchingzero-shot controlbehavioral cloningreinforcement learningmanipulation tasksvision-language models
0
0 comments X

The pith

Flow Reversal Steering converts coarse semantic guidance into effective actions for flow-matching robot policies by recovering latent noises from suboptimal inputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Flow Reversal Steering for flow-matching generalist policies that already encode many robot skills. Suboptimal but reasonable actions are run backward through the flow to recover their latent noise vectors, which are then used to sample nearby but superior action modes from the policy prior. This steers the generalist toward better behaviors when direct commands fail, whether the guidance comes from humans or vision-language models. The same reversal step supports quick distillation into an auxiliary policy via behavioral cloning and supplies semantic knowledge to bootstrap reinforcement learning on tasks where standard RL stalls.

Core claim

Flow Reversal Steering recovers the latent noise of a given action by inverting the flow-matching process, then feeds that noise forward again to land in a nearby but higher-quality mode of the generalist policy; the resulting actions improve zero-shot control, can be distilled in under a minute of behavioral cloning, and enable reinforcement learning to succeed on tasks that otherwise resist improvement.

What carries the argument

Flow Reversal Steering (FRS): the inversion of a flow-matching policy on a candidate action to extract its noise vector, followed by forward generation from that noise to reach improved modes.

If this is right

  • Coarse semantic guidance from humans or vision-language models is converted into accurate robot actions for zero-shot control.
  • An auxiliary policy trained by behavioral cloning on the recovered noises distills the steering gains, yielding up to 95 percent absolute success-rate increases after less than one minute of training.
  • Reinforcement learning initialized with semantic knowledge from FRS improves performance on tasks where standard RL shows no gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The reversal step could be applied to other generative policy architectures that admit an invertible mapping from action to noise.
  • Combining FRS with external semantic sources may reduce the amount of robot-specific data needed to adapt generalists to new environments.
  • The method suggests a general pattern for using a learned behavioral prior as a corrective filter rather than as a direct controller.

Load-bearing premise

Suboptimal but reasonable actions, when reversed through the flow, reliably land on nearby generalist modes that are better for the target task.

What would settle it

A controlled test in which actions recovered via reversal from reasonable inputs produce lower task success rates than the original suboptimal actions across multiple held-out manipulation scenarios.

Figures

Figures reproduced from arXiv: 2606.13675 by Andrew Wagenmaker, Andy Tang, Chelsea Finn, Sergey Levine, William Chen.

Figure 1
Figure 1. Figure 1: Flow Reversal Steering (FRS) uses semantic reasonings from humans or VLMs to steer generalist flow [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of FRS. (1) A human or VLM semantically reasons about the novel task to determine a reference action capturing roughly what the robot should do. (2) This coarse action is passed through flow reversal and denoising, projecting it into the space of generalist actions. (3) Both the expert noises and actions can be used for policy improvement by executing the action (zero-shot FRS, Sec. 5.2) or traini… view at source ↗
Figure 3
Figure 3. Figure 3: Illustrative examples of FRS with π0.5 in LIBERO. (a) Solid arrows are directional reference actions, dots are corresponding steered actions, and black represents samples from the base policy without steering. The outputs of FRS are biased towards “reasonable” behaviors given the scene, e.g., reaching for the bowls. (b) and (c) show rolling out the forward (red) and right (green) steered actions, followed … view at source ↗
Figure 4
Figure 4. Figure 4: Noising via the forward diffusion process vs. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: FRS trajectories can be distilled via Diffusion Steering via Behavioral Cloning (DSBC). This matches the performance of zero-shot VLM steer￾ing, while being more effective than standard BC on the same data. 5.1 Experimental Setup Simulation. We use LIBERO [17] for scalable simulated evaluations. Our zero-shot results consider the full Spatial, Object, and Goal splits, as well as all 62 tasks in 90 that our… view at source ↗
Figure 7
Figure 7. Figure 7: Left: DSRL + FRS improves upon standard generalist RL methods [15, 49]. Right: Even if FRS struggles, warmstarting with even one FRS success improves RL on tasks where the base policy gets near-zero. 5.3 FRS Enables Diffusion Steering via Behavior Cloning We now show how good trajectories from FRS yield expert noise actions, which can be distilled via DSBC. We focus on online DSBC here, and present offline… view at source ↗
Figure 8
Figure 8. Figure 8: DSBC boosts performance on real-world tasks when trained with just 10 FRS rollouts, while standard [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Offline DSBC enables noise policy learning [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: In a simplified setting, FRS brings actions from the original out-of-distribution mode into more in [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Fewer steps during FRS result in higher log-density ratio (indicating actions that are more in [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Examples of running FRS in LIBERO. The colored arrows are the cardinal reference actions, and the [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: How mean noise magnitude, reconstruction MSE, noise value distribution, and per-chunk variance [PITH_FULL_IMAGE:figures/full_fig_p022_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Running FRS on the full LIBERO-90 dataset with varying integration steps. Black is if noises were [PITH_FULL_IMAGE:figures/full_fig_p023_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Analysis of noise statistics from our zero-shot VLM FRS experiments on LIBERO-90. [PITH_FULL_IMAGE:figures/full_fig_p024_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: LIBERO-90 success rates from running FRS using an [PITH_FULL_IMAGE:figures/full_fig_p024_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: When running zero-shot FRS, averaging and [PITH_FULL_IMAGE:figures/full_fig_p025_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Prompt for querying Gemini for motions. Brackets indicate minor changes between LIBERO-90 and [PITH_FULL_IMAGE:figures/full_fig_p027_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Example outputs of VLM steering in LIBERO. Red indicates the direction suggested by the VLM. [PITH_FULL_IMAGE:figures/full_fig_p028_19.png] view at source ↗
read the original abstract

Generalist policies can learn a wide range of skills from diverse robot datasets. In order to solve or improve on challenging news tasks, we need a way to infer and invoke the appropriate actions from the policy's rich behavioral prior, especially when directly commanding the policy fails. We focus on flow matching generalists and propose Flow Reversal Steering (FRS): a method that takes suboptimal but ``reasonable'' actions, finds their latent noises by passing them through the flow policy in reverse, and maps them to nearby generalist action modes. We evaluate FRS across many simulated and real-world manipulation settings. First, FRS can turn coarse semantic guidance from humans or vision-language models (VLMs) into corresponding good robot actions, improving zero-shot control. These gains can be distilled with behavioral cloning by training an auxiliary policy to output noises that the generalist maps to good actions -- showing up to 95% absolute task success rate boosts in under a minute of training. Finally, FRS enables policy improvement by bootstrapping reinforcement learning with semantic knowledge, improving on several tasks that standard RL fails to improve on.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Flow Reversal Steering (FRS) for flow-matching robotic generalist policies. FRS takes suboptimal but reasonable actions, inverts them through the flow ODE to recover latent noise, and forwards the noise to produce improved actions from nearby modes in the generalist's distribution. The approach is claimed to convert coarse semantic guidance (from humans or VLMs) into effective zero-shot robot actions, to enable distillation of these gains via an auxiliary policy trained by behavioral cloning (yielding up to 95% absolute success-rate improvements in under one minute), and to bootstrap RL with semantic knowledge on tasks where standard RL fails to improve.

Significance. If the central empirical claims and the mode-mapping assumption hold with supporting analysis, the work would offer a practical, low-data method for steering generalist flow policies using semantic inputs. The distillation and RL-bootstrapping results, if reproducible, would be notable for their reported speed and applicability to real-world manipulation.

major comments (2)
  1. [Abstract / FRS method description] Abstract / FRS method description: the claim that an arbitrary but reasonable suboptimal action a_sub, when inverted to recover noise z = reverse(a_sub), produces a z whose forward pass yields an action a' that is both in the generalist support and measurably superior, is presented without any analysis of the inverse map's properties (e.g., Lipschitz constant of the ODE inverse or mode separation in latent space). Flow matching only guarantees transport from base noise to the training distribution and supplies no guarantee for out-of-manifold points.
  2. [Experiments section (implied by abstract claims)] Experiments section (implied by abstract claims): the reported 95% absolute success-rate boosts and RL improvements are stated without reference to concrete baselines, number of evaluation trials, standard deviations, or statistical tests, preventing assessment of whether the gains are robust or task-specific.
minor comments (1)
  1. [Abstract] The abstract refers to evaluation across 'many simulated and real-world manipulation settings' but supplies no enumeration of the specific tasks, environments, or success metrics used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed feedback. We respond to each major comment below, clarifying the manuscript's contributions and indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract / FRS method description] Abstract / FRS method description: the claim that an arbitrary but reasonable suboptimal action a_sub, when inverted to recover noise z = reverse(a_sub), produces a z whose forward pass yields an action a' that is both in the generalist support and measurably superior, is presented without any analysis of the inverse map's properties (e.g., Lipschitz constant of the ODE inverse or mode separation in latent space). Flow matching only guarantees transport from base noise to the training distribution and supplies no guarantee for out-of-manifold points.

    Authors: We acknowledge the absence of a formal theoretical analysis of the inverse ODE map (e.g., Lipschitz properties or mode separation guarantees). The manuscript presents FRS as an empirical steering technique that exploits the learned transport map on points that are reasonable (i.e., near the training distribution). In practice, the flow model defines a bijective mapping, and our experiments across multiple manipulation tasks show that reversal of reasonable actions consistently yields improved actions from nearby modes. We will add a short discussion paragraph in Section 3 noting the empirical nature of the approach and the lack of out-of-manifold guarantees, while emphasizing that all evaluated actions remain within the support of the trained policy. revision: partial

  2. Referee: [Experiments section (implied by abstract claims)] Experiments section (implied by abstract claims): the reported 95% absolute success-rate boosts and RL improvements are stated without reference to concrete baselines, number of evaluation trials, standard deviations, or statistical tests, preventing assessment of whether the gains are robust or task-specific.

    Authors: The full experiments section (Section 4) specifies the concrete baselines (direct generalist rollout, VLM guidance without FRS, and standard RL without semantic bootstrapping), reports results aggregated over 100 trials per task with standard deviations, and includes pairwise statistical comparisons. The abstract summarizes the largest observed gains for brevity. We will revise the abstract to briefly reference the evaluation protocol (100 trials, reported std. devs.) and point to the detailed tables and figures in the experiments section. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method applies external flow-matching structure to new steering procedure

full rationale

The paper introduces Flow Reversal Steering as an operational procedure that inverts an existing flow-matching vector field (trained on prior data) to recover latent noise from a given action and then integrates forward from that noise. No equation or claim reduces the target quantity to a fitted parameter of itself, nor does any central result rest on a self-citation whose content is the result being proved. The derivation chain is therefore self-contained against the external flow-matching model and the empirical evaluations; the reader's score of 1.0 is consistent with this assessment.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review identifies no free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5720 in / 1105 out tokens · 22689 ms · 2026-06-27T06:18:46.900153+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

109 extracted references · 1 canonical work pages

  1. [1]

    M. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, P. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn. Openvla: An open-source vision-language-action model. 2024

  2. [2]

    Brohan, N

    A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn, P. Florence, C. Fu, M. G. Arenas, K. Gopalakrishnan, K. Han, K. Hausman, A. Herzog, J. Hsu, B. Ichter, A. Irpan, N. Joshi, R. Julian, D. Kalashnikov, Y . Kuang, I. Leal, L. Lee, T.-W. E. Lee, S. Levine, Y . Lu, H. Michalewski, I. Mordatch, K. Pe...

  3. [3]

    Barreiros, A

    TRI LBM Team, J. Barreiros, A. Beaulieu, A. Bhat, R. Cory, E. Cousineau, H. Dai, C.-H. Fang, K. Hashimoto, M. Z. Irshad, M. Itkina, N. Kuppuswamy, K.-H. Lee, K. Liu, D. McConachie, I. McMahon, H. Nishimura, C. Phillips-Grafflin, C. Richter, P. Shah, K. Srinivasan, B. Wulfe, C. Xu, M. Zhang, A. Alspach, M. Angeles, K. Arora, V . C. Guizilini, A. Castro, D....

  4. [4]

    S. Ye, Y . Ge, K. Zheng, S. Gao, S. Yu, G. Kurian, S. Indupuru, Y . L. Tan, C. Zhu, J. Xi- ang, A. Malik, K. Lee, W. Liang, N. Ranawaka, J. Gu, Y . Xu, G. Wang, F. Hu, A. Narayan, J. Bjorck, J. Wang, G. Kim, D. Niu, R. Zheng, Y . Xie, J. Wu, Q. Wang, R. Julian, D. Xu, Y . Du, Y . Chebotar, S. Reed, J. Kautz, Y . Zhu, L. J. Fan, and J. Jang. World action m...

  5. [5]

    J. Pai, L. Achenbach, V . Montesinos, B. Forrai, O. Mees, and E. Nava. mimic-video: Video- action models for generalizable robot control beyond vlas, 2025. URLhttps://arxiv. org/abs/2512.15692

  6. [6]

    Ho and T

    J. Ho and T. Salimans. Classifier-free diffusion guidance, 2022

  7. [7]

    Dhariwal and A

    P. Dhariwal and A. Nichol. Diffusion models beat gans on image synthesis, 2021. URL https://arxiv.org/abs/2105.05233

  8. [8]

    Singhal, Z

    R. Singhal, Z. Horvitz, R. Teehan, M. Ren, Z. Yu, K. McKeown, and R. Ranganath. A general framework for inference-time scaling and steering of diffusion models, 2025. URLhttps: //arxiv.org/abs/2501.06848. 11

  9. [9]

    Yoneda, L

    T. Yoneda, L. Sun, G. Yang, B. Stadie, and M. Walter. To the noise and back: Diffusion for shared autonomy, 2025. URLhttps://arxiv.org/abs/2302.12244

  10. [10]

    Y . Wang, L. Wang, Y . Du, B. Sundaralingam, X. Yang, Y .-W. Chao, C. Perez-D’Arpino, D. Fox, and J. Shah. Inference-time policy steering through human interactions, 2024

  11. [11]

    Frans, S

    K. Frans, S. Park, P. Abbeel, and S. Levine. Diffusion guidance is a controllable policy im- provement operator, 2025. URLhttps://arxiv.org/abs/2505.23458

  12. [12]

    J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models, 2020. URLhttps: //arxiv.org/abs/2006.11239

  13. [13]

    C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song. Diffusion policy: Visuomotor policy learning via action diffusion, 2024. URLhttps://arxiv. org/abs/2303.04137

  14. [14]

    Lipman, R

    Y . Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling, 2023. URLhttps://arxiv.org/abs/2210.02747

  15. [15]

    Wagenmaker, M

    A. Wagenmaker, M. Nakamoto, Y . Zhang, S. Park, W. Yagoub, A. Nagabandi, A. Gupta, and S. Levine. Steering your diffusion policy with latent space reinforcement learning, 2025. URL https://arxiv.org/abs/2506.15799

  16. [16]

    Black, N

    Physical Intelligence, K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, M. Y . Galliker, D. Ghosh, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, D. LeBlanc, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, A. Z. Ren, L. X. Shi, L. Smith, J. T. Springenberg, K. Stachowicz, J. Tanner...

  17. [17]

    B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone. Libero: Benchmarking knowl- edge transfer for lifelong robot learning, 2023. URLhttps://arxiv.org/abs/2306. 03310

  18. [18]

    Khazatsky, K

    A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, M. K. Srirama, L. Y . Chen, K. Ellis, P. D. Fagan, J. Hejna, M. Itkina, M. Lepert, Y . J. Ma, P. T. Miller, J. Wu, S. Belkhale, S. Dass, H. Ha, A. Jain, A. Lee, Y . Lee, M. Memmel, S. Park, I. Radosavovic, K. Wang, A. Zhan, K. Black, C. Chi, K. B. Hatch, S. Lin, J. ...

  19. [19]

    Bommasani, D

    R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, E. Brynjolfsson, S. Buch, D. Card, R. Castellon, N. Chat- terji, A. Chen, K. Creel, J. Q. Davis, D. Demszky, C. Donahue, M. Doumbouya, E. Durmus, S. Ermon, J. Etchemendy, K. Ethayarajh, L. Fei-Fei, C. Finn, T. Gale, L. Gillespie, K. ...

  20. [20]

    O’Neill, A

    Embodiment Collaboration, A. O’Neill, A. Rehman, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain, A. Tung, A. Bewley, A. Herzog, A. Ir- pan, A. Khazatsky, A. Rai, A. Gupta, A. Wang, A. Kolobov, A. Singh, A. Garg, A. Kembhavi, A. Xie, A. Brohan, A. Raffin, A. Sharma, A. Yavary, A. Jain, A. Balakrishna, A. Wahid, B. B...

  21. [21]

    Jiang, T

    T. Jiang, T. Yuan, Y . Liu, C. Lu, J. Cui, X. Liu, S. Cheng, J. Gao, H. Xu, and H. Zhao. Galaxea open-world dataset and g0 dual-system vla model, 2025. URLhttps://arxiv.org/ abs/2509.00576

  22. [22]

    Black, N

    K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Haus- man, B. Ichter, S. Jakubczak, T. Jones, L. Ke, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, L. X. Shi, J. Tanner, Q. Vuong, A. Walling, H. Wang, and U. Zhilinsky.π 0: A vision-language-action flow model for general robot control, 2024. URLhttps://arxiv. o...

  23. [23]

    Liang, Y

    A. Liang, Y . Korkmaz, J. Zhang, M. Hwang, A. Anwar, S. Kaushik, A. Shah, A. S. Huang, L. Zettlemoyer, D. Fox, Y . Xiang, A. Li, A. Bobu, A. Gupta, S. Tu, E. Biyik, and J. Zhang. Robometer: Scaling general-purpose robotic reward models via trajectory comparisons, 2026. URLhttps://arxiv.org/abs/2603.02115. 13

  24. [24]

    T. Lee, A. Wagenmaker, K. Pertsch, P. Liang, S. Levine, and C. Finn. Roboreward: General- purpose vision-language reward models for robotics, 2026. URLhttps://arxiv.org/ abs/2601.00675

  25. [25]

    S. A. Sontakke, J. Zhang, S. M. R. Arnold, K. Pertsch, E. Bıyık, D. Sadigh, C. Finn, and L. Itti. Roboclip: One demonstration is enough to learn robot policies, 2023

  26. [26]

    S. Zhai, Q. Zhang, T. Zhang, F. Huang, H. Zhang, M. Zhou, S. Zhang, L. Liu, S. Lin, and J. Pang. A vision-language-action-critic model for robotic real-world reinforcement learning,

  27. [27]

    URLhttps://arxiv.org/abs/2509.15937

  28. [28]

    Liang, W

    J. Liang, W. Huang, F. Xia, P. Xu, K. Hausman, B. Ichter, P. Florence, and A. Zeng. Code as policies: Language model programs for embodied control, 2023. URLhttps://arxiv. org/abs/2209.07753

  29. [29]

    Singh, V

    I. Singh, V . Blukis, A. Mousavian, A. Goyal, D. Xu, J. Tremblay, D. Fox, J. Thomason, and A. Garg. Progprompt: Generating situated robot task plans using large language models, 2022

  30. [30]

    H. Ha, P. Florence, and S. Song. Scaling up and distilling down: Language-guided robot skill acquisition, 2023

  31. [31]

    Vemprala, R

    S. Vemprala, R. Bonatti, A. Bucker, and A. Kapoor. Chatgpt for robotics: Design principles and model abilities. Technical report, Microsoft, 2023

  32. [32]

    J. Shi, R. Yang, K. Chao, B. S. Wan, Y . S. Shao, J. Lei, J. Qian, L. Le, P. Chaudhari, K. Dani- ilidis, et al. Maestro: Orchestrating robotics modules with vision-language models for zero- shot generalist robots, 2025

  33. [33]

    M. Fu, J. Yu, K. El-Refai, E. Kou, H. Xue, H. Huang, W. Xiao, G. Wang, F.-F. Li, G. Shi, J. Wu, S. Sastry, Y . Zhu, K. Goldberg, and L. J. Fan. Cap-x: A framework for benchmarking and improving coding agents for robot manipulation, 2026. URLhttps://arxiv.org/ abs/2603.22435

  34. [34]

    Huang, C

    W. Huang, C. Wang, R. Zhang, Y . Li, J. Wu, and L. Fei-Fei. V oxposer: Composable 3d value maps for robotic manipulation with language models, 2023. URLhttps://arxiv.org/ abs/2307.05973

  35. [35]

    Kumar, W

    N. Kumar, W. Shen, F. Ramos, D. Fox, T. Lozano-P ´erez, L. P. Kaelbling, and C. R. Garrett. Open-world task and motion planning via vision-language model generated constraints, 2026. URLhttps://arxiv.org/abs/2411.08253

  36. [36]

    W. Shen, N. Kumar, S. Chintalapudi, J. Wang, C. Watson, E. Hu, J. Cao, D. Jayaraman, L. P. Kaelbling, and T. Lozano-P ´erez. Tiptop: A modular open-vocabulary planning system for robotic manipulation, 2026. URLhttps://arxiv.org/abs/2603.09971

  37. [37]

    Nasiriany, F

    S. Nasiriany, F. Xia, W. Yu, T. Xiao, J. Liang, I. Dasgupta, A. Xie, D. Driess, A. Wahid, Z. Xu, Q. Vuong, T. Zhang, T.-W. E. Lee, K.-H. Lee, P. Xu, S. Kirmani, Y . Zhu, A. Zeng, K. Hausman, N. Heess, C. Finn, S. Levine, and B. Ichter. Pivot: Iterative visual prompting elicits actionable knowledge for vlms, 2024. URLhttps://arxiv.org/abs/2402.07872

  38. [38]

    F. Liu, K. Fang, P. Abbeel, and S. Levine. Moka: Open-vocabulary robotic manipulation through mark-based visual prompting, 2024

  39. [39]

    A. J. Sathyamoorthy, K. Weerakoon, M. Elnoor, A. Zore, B. Ichter, F. Xia, J. Tan, W. Yu, and D. Manocha. Convoi: Context-aware navigation using vision language models in outdoor and indoor environments, 2024. URLhttps://arxiv.org/abs/2403.15637

  40. [40]

    Y . J. Ma, J. Hejna, A. Wahid, C. Fu, D. Shah, J. Liang, Z. Xu, S. Kirmani, P. Xu, D. Driess, T. Xiao, J. Tompson, O. Bastani, D. Jayaraman, W. Yu, T. Zhang, D. Sadigh, and F. Xia. Vision language models are in-context value learners, 2024. URLhttps://arxiv.org/abs/ 2411.04549. 14

  41. [41]

    Rocamonde, V

    J. Rocamonde, V . Montesinos, E. Nava, E. Perez, and D. Lindner. Vision-language models are zero-shot reward models for reinforcement learning, 2024. URLhttps://arxiv.org/ abs/2310.12921

  42. [42]

    S. Chen, C. Harrison, Y .-C. Lee, A. J. Yang, Z. Ren, L. J. Ratliff, J. Duan, D. Fox, and R. Kr- ishna. Topreward: Token probabilities as hidden zero-shot rewards for robotics, 2026. URL https://arxiv.org/abs/2602.19313

  43. [43]

    Budzianowski, E

    P. Budzianowski, E. Wi ´snios, M. Tyrolski, G. G ´oral, I. Kulakov, V . Petrenko, and K. Walas. Opengvl – benchmarking visual temporal progress for data curation, 2026. URLhttps: //arxiv.org/abs/2509.17321

  44. [44]

    Zhang, C

    J. Zhang, C. Qian, H. Sun, H. Lu, D. Wang, L. Xue, and H. Liu. Progresslm: Towards progress reasoning in vision-language models, 2026. URLhttps://arxiv.org/abs/ 2601.15224

  45. [45]

    Y . J. Ma, W. Liang, G. Wang, D.-A. Huang, O. Bastani, D. Jayaraman, Y . Zhu, L. Fan, and A. Anandkumar. Eureka: Human-level reward design via coding large language models, 2024. URLhttps://arxiv.org/abs/2310.12931

  46. [46]

    Bjorck, F

    J. Bjorck, F. Casta ˜neda, N. Cherniadev, X. Da, R. Ding, L. Fan, Y . Fang, D. Fox, F. Hu, S. Huang, et al. Gr00t n1: An open foundation model for generalist humanoid robots.arXiv preprint arXiv:2503.14734, 2025

  47. [47]

    Physical Intelligence, A. Amin, R. Aniceto, A. Balakrishna, K. Black, K. Conley, G. Connors, J. Darpinian, K. Dhabalia, J. DiCarlo, D. Driess, M. Equi, A. Esmail, Y . Fang, C. Finn, C. Glos- sop, T. Godden, I. Goryachev, L. Groom, H. Hancock, K. Hausman, G. Hussein, B. Ichter, S. Jakubczak, R. Jen, T. Jones, B. Katz, L. Ke, C. Kuchi, M. Lamb, D. LeBlanc, ...

  48. [48]

    C. Xu, Q. Li, J. Luo, and S. Levine. Rldg: Robotic generalist policy distillation via reinforce- ment learning, 2024. URLhttps://arxiv.org/abs/2412.09858

  49. [49]

    M. S. Mark, T. Gao, G. G. Sampaio, M. K. Srirama, A. Sharma, C. Finn, and A. Kumar. Policy agnostic rl: Offline rl and online rl fine-tuning of any class and backbone, 2024. URL https://arxiv.org/abs/2412.06685

  50. [50]

    W. Xiao, H. Lin, A. Peng, H. Xue, T. He, Y . Xie, F. Hu, J. Wu, Z. Luo, L. J. Fan, G. Shi, and Y . Zhu. Self-improving vision-language-action models with data generation via residual rl, 2025. URLhttps://arxiv.org/abs/2511.00091

  51. [51]

    Johannink, S

    T. Johannink, S. Bahl, A. Nair, J. Luo, A. Kumar, M. Loskyll, J. A. Ojea, E. Solowjow, and S. Levine. Residual reinforcement learning for robot control, 2018. URLhttps://arxiv. org/abs/1812.03201

  52. [52]

    S. Ding, K. Hu, S. Zhong, H. Luo, W. Zhang, J. Wang, J. Wang, and Y . Shi. Genpo: Genera- tive diffusion models meet on-policy reinforcement learning.Advances in Neural Information Processing Systems, 38:130443–130474, 2026

  53. [53]

    J. Lu, X. Qin, Y . Jiang, K. Wang, C. Zhang, B. Liang, J. Yang, M. Xu, and L. Zhao. Unified noise steering for efficient human-guided vla adaptation.arXiv preprint arXiv:2605.10821, 2026

  54. [54]

    J. Song, C. Meng, and S. Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020. 15

  55. [55]

    T. Z. Zhao, V . Kumar, S. Levine, and C. Finn. Learning fine-grained bimanual manipulation with low-cost hardware, 2023

  56. [56]

    A. Jain, M. Zhang, K. Arora, W. Chen, M. Torne, M. Z. Irshad, S. Zakharov, Y . Wang, S. Levine, C. Finn, W.-C. Ma, D. Shah, A. Gupta, and K. Pertsch. Polaris: Scalable real- to-sim evaluations for generalist robot policies, 2025. URLhttps://arxiv.org/abs/ 2512.16881

  57. [57]

    W. Chen, S. Belkhale, S. Mirchandani, O. Mees, D. Driess, K. Pertsch, and S. Levine. Training strategies for efficient embodied reasoning, 2025

  58. [58]

    Driess, J

    D. Driess, J. T. Springenberg, B. Ichter, L. Yu, A. Li-Bell, K. Pertsch, A. Z. Ren, H. Walke, Q. Vuong, L. X. Shi, and S. Levine. Knowledge insulating vision-language-action models: Train fast, run fast, generalize better, 2025. URLhttps://arxiv.org/abs/2505. 23705

  59. [59]

    Nakamoto, O

    M. Nakamoto, O. Mees, A. Kumar, and S. Levine. Steering your generalists: Improving robotic foundation models via value guidance.Conference on Robot Learning (CoRL), 2024

  60. [60]

    J. Kwok, C. Agia, R. Sinha, M. Foutter, S. Li, I. Stoica, A. Mirhoseini, and M. Pavone. Robomonkey: Scaling test-time sampling and verification for vision-language-action models,

  61. [61]

    URLhttps://arxiv.org/abs/2506.17811

  62. [62]

    Q. Li, S. Park, and S. Levine. Decoupled q-chunking, 2025. URLhttps://arxiv.org/ abs/2512.10926

  63. [63]

    Q. Li, Z. Zhou, and S. Levine. Reinforcement learning with action chunking, 2026. URL https://arxiv.org/abs/2507.07969

  64. [64]

    Abeyruwan, J

    Gemini Robotics Team, S. Abeyruwan, J. Ainslie, J.-B. Alayrac, M. G. Arenas, T. Arm- strong, A. Balakrishna, R. Baruch, M. Bauza, M. Blokzijl, S. Bohez, K. Bousmalis, A. Bro- han, T. Buschmann, A. Byravan, S. Cabi, K. Caluwaerts, F. Casarini, O. Chang, J. E. Chen, X. Chen, H.-T. L. Chiang, K. Choromanski, D. D’Ambrosio, S. Dasari, T. Davchev, C. Devin, N....

  65. [65]

    X. Zhou, Y . Xu, G. Tie, Y . Chen, G. Zhang, D. Chu, P. Zhou, and L. Sun. Libero-pro: Towards robust and fair evaluation of vision-language-action models beyond memorization, 2025. URL https://arxiv.org/abs/2510.03827

  66. [66]

    G. Wang, C. Zhang, Q. Liu, J. Zhang, J. Cai, J. Liu, and X. Liu. Libero-x: Robustness litmus for vision-language-action models, 2026. URLhttps://arxiv.org/abs/2602.06556

  67. [67]

    T. J. Boerner, S. Deems, T. R. Furlani, S. L. Knuth, and J. Towns. Access: Advancing in- novation: Nsf’s advanced cyberinfrastructure coordination ecosystem: Services & support. InPractice and Experience in Advanced Research Computing, PEARC ’23, page 173–176. ACM, 2023. doi:10.1145/3569951.3597559. URLhttp://dx.doi.org/10.1145/ 3569951.3597559. 16

  68. [68]

    Mokady, A

    R. Mokady, A. Hertz, K. Aberman, Y . Pritch, and D. Cohen-Or. Null-text inversion for editing real images using guided diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6038–6047, 2023

  69. [69]

    G. Kim, T. Kwon, and J. C. Ye. Diffusionclip: Text-guided diffusion models for robust image manipulation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2426–2435, 2022

  70. [70]

    Tumanyan, M

    N. Tumanyan, M. Geyer, S. Bagon, and T. Dekel. Plug-and-play diffusion features for text- driven image-to-image translation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1921–1930, 2023

  71. [71]

    Wallace, A

    B. Wallace, A. Gokul, and N. Naik. Edict: Exact diffusion inversion via coupled transforma- tions. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22532–22541, 2023

  72. [72]

    Hertz, R

    A. Hertz, R. Mokady, J. Tenenbaum, K. Aberman, Y . Pritch, and D. Cohen-Or. Prompt-to- prompt image editing with cross attention control.arXiv preprint arXiv:2208.01626, 2022

  73. [73]

    X. Su, J. Song, C. Meng, and S. Ermon. Dual diffusion implicit bridges for image-to-image translation.arXiv preprint arXiv:2203.08382, 2022

  74. [74]

    L. Rout, Y . Chen, N. Ruiz, C. Caramanis, S. Shakkottai, and W.-S. Chu. Semantic im- age inversion and editing using rectified stochastic differential equations.arXiv preprint arXiv:2410.10792, 2024

  75. [75]

    J. Wang, J. Pu, Z. Qi, J. Guo, Y . Ma, N. Huang, Y . Chen, X. Li, and Y . Shan. Taming rectified flow for inversion and editing.arXiv preprint arXiv:2411.04746, 2024

  76. [76]

    Y . Deng, X. He, C. Mei, P. Wang, and F. Tang. Fireflow: Fast inversion of rectified flow for image semantic editing.arXiv preprint arXiv:2412.07517, 2024

  77. [77]

    Avrahami, O

    O. Avrahami, O. Patashnik, O. Fried, E. Nemchinov, K. Aberman, D. Lischinski, and D. Cohen-Or. Stable flow: Vital layers for training-free image editing. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 7877–7888, 2025

  78. [78]

    G. Jiao, B. Huang, K.-C. Wang, and R. Liao. Uniedit-flow: Unleashing inversion and editing in the era of flow models.arXiv preprint arXiv:2504.13109, 2025

  79. [79]

    Chihaoui, A

    H. Chihaoui, A. Lemkhenter, and P. Favaro. Blind image restoration via fast diffusion inver- sion.Advances in Neural Information Processing Systems, 37:34513–34532, 2024

  80. [80]

    Z. Yang, K. Zeng, K. Chen, H. Fang, W. Zhang, and N. Yu. Gaussian shading: Prov- able performance-lossless image watermarking for diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12162–12171, 2024

Showing first 80 references.