arxiv: 2605.08434 · v2 · submitted 2026-05-08 · 💻 cs.RO

Recognition: 2 theorem links

· Lean Theorem

Failing Forward: Adaptive Failure-Informed Learning for Vision-Language-Action Models

Meng Zheng , Samhita Marri , Anwesa Choudhuri , Benjamin Planche , Zhongpai Gao , Van Nguyen Nguyen , Terrence Chen , Girish Chowdhary

show 1 more author

Ziyan Wu

Authors on Pith no claims yet

Pith reviewed 2026-05-13 06:33 UTC · model grok-4.3

classification 💻 cs.RO

keywords vision-language-actionfailure-informed learningrobotic manipulationdiffusion policiesadaptive guidancenegative examples

0 comments

The pith

Training VLA models on online failure trajectories as negative guidance boosts robotic manipulation success rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Vision-language-action models often fail because they only learn from successes and cannot correct small errors that lead to bigger problems. The paper introduces Adaptive Failure-Informed Learning to generate failure examples on the fly from the model itself and use them to guide the policy away from bad actions. By training separate generators for success and failure that share most parameters, the approach adds little overhead while making policies more robust. This matters because it turns the model's own mistakes into training data for better performance without extra human effort or custom failure designs.

Core claim

AFIL generates failure rollouts online using a pretrained VLA, then jointly trains dual action generators for successful and failed behaviors sharing a vision-language backbone. During inference, the failure generator provides adaptive negative guidance scaled by the distance between success and failure distributions at each step, steering actions toward reliable success modes.

What carries the argument

Dual Action Generators (DAGs) trained on success and failure trajectories, with adaptive guidance strength based on per-step distribution distance between the two.

Load-bearing premise

The failure trajectories produced online by the pretrained VLA are informative enough to provide useful negative guidance without introducing harmful biases.

What would settle it

Running the same robotic tasks with and without AFIL and finding no consistent improvement in success rates or robustness.

Figures

Figures reproduced from arXiv: 2605.08434 by Anwesa Choudhuri, Benjamin Planche, Girish Chowdhary, Meng Zheng, Samhita Marri, Terrence Chen, Van Nguyen Nguyen, Zhongpai Gao, Ziyan Wu.

**Figure 2.** Figure 2: Dual Action Generator (DAG)-VLA with Adaptive Failure-Informed Learning (AFIL). [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization of in-domain (left) and out-of-domain (right) manipulation task setup. Row 1: [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Vision-language-action (VLA) models provide a promising paradigm for scalable robotic manipulation, yet their reliance on success-only behavioral cloning leaves them brittle; lacking corrective training signals, minor execution errors rapidly compound into unrecoverable, out-of-distribution failures. To address this limitation, we propose Adaptive Failure-Informed Learning (AFIL), an end-to-end framework that leverages failure trajectories as adaptive negative guidance for diffusion- and flow-based VLA policies. AFIL uses a pretrained VLA to generate failure rollouts online, avoiding the need for handcrafted failure-mode design or human-in-the-loop recovery. It then jointly trains Dual Action Generators (DAGs) for successful and failed behaviors while sharing a common vision-language backbone, enabling efficient failure-aware policy learning with limited parameter overhead. During sampling, the failure generator adaptively steers action generation away from failure-prone regions and toward more reliable success modes, with guidance strength determined by the per-diffusion-step distance between success and failure distributions. Experiments across in-domain and out-of-domain robotic manipulation tasks, covering both short- and long-horizon settings, show that AFIL consistently improves task success rates and robustness over existing VLA baselines, demonstrating its effectiveness, efficiency, and generality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AFIL adds online failure rollouts and dual generators to steer VLA diffusion policies, but the gains rest on unshown numbers and untested assumptions about failure quality.

read the letter

The core idea here is straightforward: take a pretrained VLA, let it generate its own failure trajectories online, train a second action head on those failures while sharing the vision-language backbone, and then use the failure head at sampling time to push the diffusion process away from bad regions with strength set by the per-step distribution distance. That combination for diffusion-based VLAs is new enough to be worth checking, and the no-extra-annotation angle is practical for scaling.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that a pretrained VLA can generate useful failure trajectories without external supervision and that the distance between success and failure distributions provides a reliable steering signal.

free parameters (1)

guidance strength schedule
Determined by per-step distance between success and failure distributions; exact functional form and scaling constants are not specified in the abstract.

axioms (1)

domain assumption Failure trajectories generated by the current policy are representative of the failure modes that matter at deployment.
Invoked when the method uses online rollouts to train the failure generator.

pith-pipeline@v0.9.0 · 5543 in / 1281 out tokens · 54732 ms · 2026-05-13T06:33:24.291831+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

AFIL uses a pretrained VLA to generate failure rollouts online... jointly trains Dual Action Generators (DAGs) for successful and failed behaviors... guidance strength determined by the per-diffusion-step distance between success and failure distributions
IndisputableMonolith/Foundation/AbsoluteFloorClosure absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Adaptive failure-informed sampling... ϵ∗_FI = ϵ_succ − λ̂_η (ϵ_fail − ϵ_succ) with λ̂_η ∝ cosine distance

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 1 internal anchor

[1]

Conference on Robot Learning , year=

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control , author=. Conference on Robot Learning , year=

work page
[2]

Proceedings of Robotics: Science and Systems , year =

Octo: An Open-Source Generalist Robot Policy , author =. Proceedings of Robotics: Science and Systems , year =

work page
[3]

Moo Jin Kim and Karl Pertsch and Siddharth Karamcheti and Ted Xiao and Ashwin Balakrishna and Suraj Nair and Rafael Rafailov and Ethan P Foster and Pannag R Sanketi and Quan Vuong and Thomas Kollar and Benjamin Burchfiel and Russ Tedrake and Dorsa Sadigh and Sergey Levine and Percy Liang and Chelsea Finn , booktitle=. Open

work page
[4]

Advances in Neural Information Processing Systems , year=

Libero: Benchmarking Knowledge Transfer for Lifelong Robot Learning , author=. Advances in Neural Information Processing Systems , year=

work page
[5]

arXiv preprint arXiv:2509.18953 , year=

Eva-VLA: Evaluating Vision-Language-Action Models' Robustness Under Real-World Physical Variations , author=. arXiv preprint arXiv:2509.18953 , year=

work page arXiv
[6]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

work page
[7]

Annual Conference on Robot Learning , year=

Robotic Control via Embodied Chain-of-Thought Reasoning , author=. Annual Conference on Robot Learning , year=

work page
[8]

Qiao Gu and Yuanliang Ju and Shengxiang Sun and Igor Gilitschenski and Haruki Nishimura and Masha Itkina and Florian Shkurti , booktitle=

work page
[9]

9th Annual Conference on Robot Learning , year=

Uncertainty-Aware Latent Safety Filters for Avoiding Out-of-Distribution Failures , author=. 9th Annual Conference on Robot Learning , year=

work page
[10]

Workshop on Out-of-Distribution Generalization in Robotics at RSS , year=

Identifying Precursors to Failures in Robotic Lift-and-Place Tasks , author=. Workshop on Out-of-Distribution Generalization in Robotics at RSS , year=

work page
[11]

arXiv preprint arXiv:2403.12910 , year=

Yell At Your Robot: Improving On-the-Fly from Language Corrections , author=. arXiv preprint arXiv:2403.12910 , year=

work page arXiv
[12]

arXiv preprint arXiv:2412.12602 , year=

Don't Yell at Your Robot: Physical Correction as the Collaborative Interface for Language Model Powered Robots , author=. arXiv preprint arXiv:2412.12602 , year=

work page arXiv
[13]

Workshop on Out-of-Distribution Generalization in Robotics at RSS , year=

Fail2Progress: Learning from Failures with Stein Variational Inference for Robot Manipulation , author=. Workshop on Out-of-Distribution Generalization in Robotics at RSS , year=

work page
[14]

Workshop on Out-of-Distribution Generalization in Robotics at RSS , year=

From Mystery to Mastery: Failure Diagnosis for Improving Manipulation Policies , author=. Workshop on Out-of-Distribution Generalization in Robotics at RSS , year=

work page
[15]

Black, Kevin and Brown, Noah and Driess, Danny and Esmail, Adnan and Equi, Michael and Finn, Chelsea and Fusai, Niccolo and Groom, Lachy and Hausman, Karol and Ichter, Brian and others , journal=. _

work page
[16]

Intelligence, Physical and Amin, Ali and Aniceto, Raichelle and Balakrishna, Ashwin and Black, Kevin and Conley, Ken and Connors, Grace and Darpinian, James and Dhabalia, Karan and DiCarlo, Jared and others , journal=. _

work page
[17]

Black, Kevin and Brown, Noah and Darpinian, James and Dhabalia, Karan and Driess, Danny and Esmail, Adnan and Equi, Michael Robert and Finn, Chelsea and Fusai, Niccolo and Galliker, Manuel Y and others , booktitle=. _

work page
[18]

arXiv preprint arXiv:2505.12224 , year=

RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction , author=. arXiv preprint arXiv:2505.12224 , year=

work page arXiv
[19]

FAST: Efficient Action Tokenization for Vision-Language-Action Models

Fast: Efficient Action Tokenization for Vision-Language-Action Models , author=. arXiv preprint arXiv:2501.09747 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[20]

Songming Liu and Lingxuan Wu and Bangguo Li and Hengkai Tan and Huayu Chen and Zhengyi Wang and Ke Xu and Hang Su and Jun Zhu , booktitle=

work page
[21]

The International Journal of Robotics Research , year=

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion , author=. The International Journal of Robotics Research , year=

work page
[22]

arXiv preprint arXiv:2510.01642 , year=

FailSafe: Reasoning and Recovery from Failures in Vision-Language-Action Models , author=. arXiv preprint arXiv:2510.01642 , year=

work page arXiv
[23]

arXiv preprint arXiv:2410.00371 , year=

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation , author=. arXiv preprint arXiv:2410.00371 , year=

work page arXiv
[24]

IEEE International Conference on Robotics and Automation , year=

Open X-Embodiment: Robotic Learning Datasets and RT-X Models: Open X-Embodiment Collaboration ^0 , author=. IEEE International Conference on Robotics and Automation , year=

work page
[25]

IEEE Robotics and Automation Letters , year=

Rlbench: The robot learning benchmark & learning environment , author=. IEEE Robotics and Automation Letters , year=

work page
[26]

Advances in Neural Information Processing Systems , year =

Denoising Diffusion Probabilistic Models , author =. Advances in Neural Information Processing Systems , year =

work page
[27]

The Eleventh International Conference on Learning Representations , year=

Flow Matching for Generative Modeling , author=. The Eleventh International Conference on Learning Representations , year=

work page
[28]

Annual Conference on Robot Learning , year=

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation , author=. Annual Conference on Robot Learning , year=

work page
[29]

IEEE International Conference on Robotics and Automation , year=

Iris: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data , author=. IEEE International Conference on Robotics and Automation , year=

work page
[30]

Thirty-Sixth Conference on Neural Information Processing Systems , year=

Behavior Transformers: Cloning k modes with one stone , author=. Thirty-Sixth Conference on Neural Information Processing Systems , year=

work page
[31]

2021 , booktitle =

Dhariwal, Prafulla and Nichol, Alex , title =. 2021 , booktitle =

work page 2021
[32]

NeurIPS Workshop on Deep Generative Models and Downstream Applications , year=

Classifier-Free Diffusion Guidance , author=. NeurIPS Workshop on Deep Generative Models and Downstream Applications , year=

work page
[33]

IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn , title =. IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

work page
[34]

Hyungjin Chung and Jeongsol Kim and Geon Yeong Park and Hyelin Nam and Jong Chul Ye , booktitle=

work page
[35]

IEEE/CVF International Conference on Computer Vision , year =

Fu, Xiaomeng and Li, Jia , title =. IEEE/CVF International Conference on Computer Vision , year =

work page
[36]

IEEE/CVF International Conference on Computer Vision , year =

Gandikota, Rohit and Materzynska, Joanna and Fiotto-Kaufman, Jaden and Bau, David , title =. IEEE/CVF International Conference on Computer Vision , year =

work page
[37]

IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

Schramowski, Patrick and Brack, Manuel and Deiseroth, Björn and Kersting, Kristian , title =. IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

work page
[38]

International Conference on Learning Representations , year=

Dynamic Negative Guidance of Diffusion Models , author=. International Conference on Learning Representations , year=

work page
[39]

Deep reinforcement learning for robotics: a survey of real-world successes , year =

Tang, Chen and Abbatematteo, Ben and Hu, Jiaheng and Chandra, Rohan and Mart\'. Deep reinforcement learning for robotics: a survey of real-world successes , year =. AAAI Conference on Artificial Intelligence , series =

work page
[40]

IEEE International Conference on Robotics and Automation , year=

Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , author=. IEEE International Conference on Robotics and Automation , year=

work page
[41]

IEEE Robotics and Automation Letters , year=

Self-Supervised Correspondence in Visuomotor Policy Learning , author=. IEEE Robotics and Automation Letters , year=

work page
[42]

International Conference on Learning Representations , year=

Imitating Human Behaviour with Diffusion Models , author=. International Conference on Learning Representations , year=

work page
[43]

International Conference on Artificial Intelligence and Statistics , year =

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , author =. International Conference on Artificial Intelligence and Statistics , year =

work page
[44]

IEEE/RSJ International Conference on Intelligent Robots and Systems , year=

SpeedFolding: Learning Efficient Bimanual Folding of Garments , author=. IEEE/RSJ International Conference on Intelligent Robots and Systems , year=

work page
[45]

Annual Conference on Robot Learning , year=

Implicit Behavioral Cloning , author=. Annual Conference on Robot Learning , year=

work page
[46]

Predicting structured data , year=

A Tutorial on Energy-Based Learning , author=. Predicting structured data , year=

work page
[47]

International Conference on Machine Learning , year =

Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling , author =. International Conference on Machine Learning , year =

work page
[48]

Neural Information Processing Systems , year=

Exponential Family Estimation via Adversarial Dynamics Embedding , author=. Neural Information Processing Systems , year=

work page
[49]

Robotics: Science and Systems , year=

Goal Conditioned Imitation Learning Using Score-Based Diffusion Policies , author=. Robotics: Science and Systems , year=

work page
[50]

Driess, Danny and Xia, Fei and Sajjadi, Mehdi S. M. and Lynch, Corey and Chowdhery, Aakanksha and Ichter, Brian and Wahid, Ayzaan and Tompson, Jonathan and Vuong, Quan and Yu, Tianhe and Huang, Wenlong and Chebotar, Yevgen and Sermanet, Pierre and Duckworth, Daniel and Levine, Sergey and Vanhoucke, Vincent and Hausman, Karol and Toussaint, Marc and Greff,...

work page 2023
[51]

Conference on Robot Learning , year =

LaVA-Man: Learning Visual Action Representations for Robot Manipulation , author =. Conference on Robot Learning , year =

work page
[52]

ManiSkill3:

Stone Tao and Fanbo Xiang and Arth Shukla and Yuzhe Qin and Xander Hinrichsen and Xiaodi Yuan and Chen Bao and Xinsong Lin and Yulin Liu and Tse-Kai Chan and Yuan Gao and Xuanlin Li and Tongzhou Mu and Nan Xiao and Arnav Gurha and Viswesh N and Yong Woo Choi and Yen-Ru Chen and Zhiao Huang and Roberto Calandra and Rui Chen and Shan Luo and Hao Su , bookti...

work page 2025
[53]

International Conference on Learning Representations , year=

ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills , author=. International Conference on Learning Representations , year=

work page
[54]

International Conference on Learning Representations , year=

LeRobot: An Open-Source Library for End-to-End Robot Learning , author=. International Conference on Learning Representations , year=

work page