pith. sign in

arxiv: 2606.23617 · v1 · pith:6S2VBIPCnew · submitted 2026-06-22 · 💻 cs.RO · cs.AI· cs.LG

RECALL: Recovery Experience Collection for Active Lifelong Learning in Vision-Language-Action Models

Pith reviewed 2026-06-26 08:12 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LG
keywords vision-language-action modelsactive learningcontinual learninguncertainty estimationcatastrophic forgettingimitation learningrobot policiesrecovery behaviors
0
0 comments X

The pith

Active uncertainty-guided data collection improves fine-tuning efficiency for vision-language-action models but requires methods to avoid forgetting earlier skills.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that vision-language-action models improve more efficiently when new demonstrations are collected only in states where the current policy shows high uncertainty rather than after any failure or on a fixed schedule. A sympathetic reader would care because passive collection wastes human effort on already-mastered parts of a task and waits for mistakes to happen. The authors further show that training solely on these targeted recovery demonstrations causes the model to lose performance on previously learned behaviors. They therefore compare replay-based mixing of old and new data with elastic weight consolidation and document the resulting tradeoffs in plasticity versus retention. The overall result is an empirical case that uncertainty can guide more economical lifelong adaptation in large autoregressive robot policies.

Core claim

The central claim is that an active, uncertainty-guided paradigm for collecting recovery experiences in vision-language-action models enables more efficient adaptation than passive imitation learning, but requires continual learning methods such as replay-based data mixing or elastic weight consolidation to mitigate catastrophic forgetting of prior behaviors.

What carries the argument

Uncertainty-guided selection of states for recovery demonstrations, combined with replay-based data mixing or elastic weight consolidation to balance new learning against retention in autoregressive VLA policies.

If this is right

  • Uncertainty-guided collection requires fewer demonstrations to achieve equivalent fine-tuning gains than passive collection after failures.
  • Training exclusively on the new recovery data produces measurable drops in performance on tasks learned earlier.
  • Replay-based mixing of old demonstrations with new recovery data preserves prior behaviors while allowing adaptation.
  • Elastic weight consolidation offers an alternative way to retain old skills at the cost of slower incorporation of uncertainty-guided data.
  • Tradeoffs exist between the rate of adaptation to new recovery data and the degree of retention of previously learned behaviors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same uncertainty signal could be used to decide when a deployed robot should request human intervention rather than continuing to act.
  • The approach might generalize to other large policy classes where data collection cost is the main bottleneck.
  • Better calibration of uncertainty could reduce the risk that recovery data over-represents rare failure modes.
  • Longer-horizon tasks might require uncertainty estimates that account for future states rather than only the immediate one.

Load-bearing premise

Uncertainty estimates from the vision-language-action policy accurately identify states where new demonstrations will improve performance without introducing bias into the collected recovery data.

What would settle it

If experiments that replace uncertainty-guided state selection with random or failure-triggered collection show no reduction in the number of demonstrations needed to reach the same performance level, the efficiency advantage claim would be falsified.

Figures

Figures reproduced from arXiv: 2606.23617 by Tesca FItzgerald, Ulas Berk Karli.

Figure 1
Figure 1. Figure 1: Overview of uncertainty-guided active continual learning. An initial π0-FAST policy is rolled out on LIBERO-10 while INSIGHT identifies high-uncertainty states. We compare passive collection with online recovery from the first high-uncertainty state and offline recovery from all high-uncertainty states, then integrate the resulting data using replay, or EWC regularization. within a rollout. In contrast, an… view at source ↗
Figure 2
Figure 2. Figure 2: Online recovery collection improves over passive start-state collection. We compare fine-tuned policy performance after adding matched passive or active data. Active collection is guided by Strong INSIGHT (left) or Weak INSIGHT (right). Results [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Online recovery collection is as effective as offline collection despite using fewer demonstrations. Left: training curves for online and offline Strong INSIGHT recovery datasets mixed with prior demonstrations. Right: best-checkpoint comparison across overall, collected-task, and retained-task success. 5 [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: New-only fine-tuning causes catastrophic forgetting. We compare training on new￾only data (solid lines) against replay-based training that combines old and new data (dashed lines). Results [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Low learning rates (α) and EWC reduce forgetting but limit adaptation to new data. Compared to full replay, low learning-rate fine-tuning and EWC better preserve retained-task performance than standard new-only fine-tuning, but produce smaller gains on collected tasks. but collected-task performance remains limited and overall success does not improve. Filtering the Fisher reference data changes the stabil… view at source ↗
Figure 6
Figure 6. Figure 6: Replay coverage controls the tradeoff between adaptation and retention. Left: re￾sults for Strong INSIGHT online recovery data mixed with different replay subsets. Right: best￾checkpoint comparison. 7 [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Regularization does not consistently prevent degradation, indicating that replay coverage is critical. We evaluate EWC and filtered-EWC variants when using targeted replay data. 10 Summary and Conclusion We studied active continual learning for autoregressive VLAs by using INSIGHT predictions to identify high-uncertainty, states and collect recovery demonstrations from those states. Across ex￾periments, fi… view at source ↗
Figure 8
Figure 8. Figure 8: shows the Strong INSIGHT active-versus-passive comparison with old normalization statis￾tics. Compared to the main new-normalization result in [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Weak INSIGHT active-versus-passive comparison with old normalization statistics. This comparison complements the Strong INSIGHT old-normalization ablation and shows that old normalization statistics similarly constrain adaptation under the Weak INSIGHT recovery dataset [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Full EWC coefficient sweep with low learning rate. Stronger regularization stabilizes retained-task performance, but does not provide enough plasticity to fully exploit uncertainty-guided recovery data. C Additional EWC Sweeps [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Filtered Fisher EWC sweep. Computing Fisher information on filtered prior data changes the stability-plasticity behavior, but regularization alone still does not match replay-based adaptation [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
read the original abstract

Vision-Language-Action (VLA) models are commonly fine-tuned through passive imitation learning, where additional demonstrations are collected for tasks where the policy performs poorly. This approach incurs several downsides: it requires the robot to fail before data collection is triggered, provides little guidance about which states require supervision, and wastes demonstrator effort on redundant parts of the task where the policy already performs well. In this paper, we propose an active, continual learning paradigm for VLAs. We demonstrate that active, uncertainty-guided data collection leads to more efficient fine-tuning than when using passively-collected demonstrations. However, we also find that fine-tuning only on actively-collected recovery data leads to catastrophic forgetting. We evaluate techniques for continual learning, including replay-based data mixing and elastic weight consolidation, and identify tradeoffs between plasticity to uncertainty-guided recovery data and retention of previously learned behaviors. Overall, our work contributes an empirical study of active continual learning for autoregressive VLAs, establishing that uncertainty-guided recovery demonstrations can improve adaptation efficiency while also revealing open challenges when targeted new data is incorporated into large robot policies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper proposes RECALL, an active continual learning approach for Vision-Language-Action (VLA) models. It argues that passive imitation learning requires failures before data collection and wastes effort on well-learned states; instead, uncertainty-guided active collection of recovery demonstrations improves fine-tuning efficiency. The work also reports that fine-tuning solely on such recovery data induces catastrophic forgetting and evaluates mitigation via replay-based mixing and elastic weight consolidation, identifying plasticity-retention tradeoffs in an empirical study of autoregressive VLAs.

Significance. If the empirical claims hold with proper quantification and validation, the work would be a useful empirical contribution to lifelong robot learning by showing how targeted active recovery data can reduce demonstrator effort compared with passive collection and by surfacing concrete tradeoffs when incorporating such data into large VLAs. The paper does not claim new theoretical results, parameter-free derivations, or machine-checked proofs.

major comments (3)
  1. [Abstract] Abstract: the headline claim that 'active, uncertainty-guided data collection leads to more efficient fine-tuning' is presented without any quantitative metrics, baselines, error bars, dataset sizes, or statistical controls. This is load-bearing for the central empirical result; the abstract states an outcome but supplies no numbers that would allow verification of the efficiency advantage.
  2. [Method] Method (uncertainty estimator): the paper does not define or ablate the uncertainty signal used to select recovery states (token entropy, predictive variance, ensemble disagreement, etc.) nor demonstrate that states flagged by this signal causally produce net performance gains rather than distribution shift or bias. This link is required for the weakest assumption identified in the stress-test note and remains unaddressed in the provided text.
  3. [Experiments] Experiments: no tables or figures are referenced that report success rates, sample efficiency curves, or forgetting metrics with controls for the passive vs. active comparison; without these, the reported tradeoffs between plasticity and retention cannot be assessed.
minor comments (1)
  1. [Abstract] The abstract uses the term 'catastrophic forgetting' without a precise operational definition or reference to the metric employed.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claim that 'active, uncertainty-guided data collection leads to more efficient fine-tuning' is presented without any quantitative metrics, baselines, error bars, dataset sizes, or statistical controls. This is load-bearing for the central empirical result; the abstract states an outcome but supplies no numbers that would allow verification of the efficiency advantage.

    Authors: We agree that the abstract would be strengthened by including quantitative support for the efficiency claim. The experimental results in the paper quantify the gains from active collection relative to passive baselines. We will revise the abstract to incorporate these key metrics, baselines, error bars, and dataset details to enable verification. revision: yes

  2. Referee: [Method] Method (uncertainty estimator): the paper does not define or ablate the uncertainty signal used to select recovery states (token entropy, predictive variance, ensemble disagreement, etc.) nor demonstrate that states flagged by this signal causally produce net performance gains rather than distribution shift or bias. This link is required for the weakest assumption identified in the stress-test note and remains unaddressed in the provided text.

    Authors: The current manuscript describes uncertainty-guided selection at a high level but does not provide an explicit definition of the signal or an ablation. We will expand the method section to define the uncertainty estimator in detail and add an ablation comparing it against random selection to demonstrate net gains while controlling for distribution shift. revision: yes

  3. Referee: [Experiments] Experiments: no tables or figures are referenced that report success rates, sample efficiency curves, or forgetting metrics with controls for the passive vs. active comparison; without these, the reported tradeoffs between plasticity and retention cannot be assessed.

    Authors: The paper contains figures and tables presenting success rates, sample efficiency curves, and forgetting metrics for the passive versus active comparisons. We will add explicit in-text references to these results when discussing the plasticity-retention tradeoffs to improve clarity and verifiability. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical study with independent experimental comparisons

full rationale

The paper is an empirical study of active vs. passive data collection for VLA fine-tuning. It reports experimental outcomes on efficiency and forgetting without any equations, parameter fits presented as predictions, self-definitional constructs, or load-bearing self-citations. The central claim rests on direct comparisons of collection strategies and continual-learning techniques, which are externally falsifiable via the described robot evaluations and do not reduce to the inputs by construction. No derivation chain exists that could exhibit the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on abstract; no free parameters, axioms, or invented entities are identifiable or extractable.

pith-pipeline@v0.9.1-grok · 5728 in / 999 out tokens · 20140 ms · 2026-06-26T08:12:19.634592+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 4 canonical work pages

  1. [1]

    Brohan, N

    A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Haus- man, A. Herzog, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, T. Jackson, S. Jesmonth, N. J. Joshi, R. Julian, D. Kalashnikov, Y . Kuang, I. Leal, K.-H. Lee, S. Levine, Y . Lu, U. Malla, D. Man- junath, I. Mordatch, O. Nachum, C. Parada, J. Peralta, E. Perez, K. Pertsc...

  2. [2]

    Brohan, N

    A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn, P. Florence, C. Fu, M. G. Arenas, K. Gopalakrishnan, K. Han, K. Hausman, A. Herzog, J. Hsu, B. Ichter, A. Irpan, N. Joshi, R. Julian, D. Kalashnikov, Y . Kuang, I. Leal, L. Lee, T.-W. E. Lee, S. Levine, Y . Lu, H. Michalewski, I. Mordatch, K. Pe...

  3. [3]

    G. R. Team, S. Abeyruwan, J. Ainslie, J.-B. Alayrac, M. G. Arenas, T. Armstrong, A. Balakr- ishna, R. Baruch, M. Bauza, M. Blokzijl, S. Bohez, K. Bousmalis, A. Brohan, T. Buschmann, A. Byravan, S. Cabi, K. Caluwaerts, F. Casarini, O. Chang, J. E. Chen, X. Chen, H.-T. L. Chi- ang, K. Choromanski, D. D’Ambrosio, S. Dasari, T. Davchev, C. Devin, N. D. Palo, ...

  4. [4]

    Pertsch, K

    K. Pertsch, K. Stachowicz, B. Ichter, D. Driess, S. Nair, Q. Vuong, O. Mees, C. Finn, and S. Levine. Fast: Efficient action tokenization for vision-language-action models, 2025. URL https://arxiv.org/abs/2501.09747

  5. [5]

    Intelligence, K

    P. Intelligence, K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, M. Y . Galliker, D. Ghosh, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, D. LeBlanc, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, A. Z. Ren, L. X. Shi, L. Smith, J. T. Springenberg, K. Stachowicz, J. Tanner, Q. V...

  6. [6]

    U. B. Karli, Z. Shangguan, and T. FItzgerald. Insight: Inference-time sequence introspection for generating help triggers in vision-language-action models, 2025. URLhttps://arxiv. org/abs/2510.01389

  7. [7]

    Ross and D

    S. Ross and D. Bagnell. Efficient reductions for imitation learning. In Y . W. Teh and M. Titterington, editors,Proceedings of the Thirteenth International Conference on Artifi- cial Intelligence and Statistics, volume 9 ofProceedings of Machine Learning Research, pages 661–668, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR. URLhttps: //procee...

  8. [8]

    Kelly, C

    M. Kelly, C. Sidrane, K. Driggs-Campbell, and M. J. Kochenderfer. Hg-dagger: Interactive imitation learning with human experts. In2019 International Conference on Robotics and Automation (ICRA), page 8077–8083. IEEE Press, 2019. doi:10.1109/ICRA.2019.8793698. URLhttps://doi.org/10.1109/ICRA.2019.8793698

  9. [9]

    Y . Cui, D. Isele, S. Niekum, and K. Fujimura. Uncertainty-aware data aggregation for deep imitation learning, 2019. URLhttps://arxiv.org/abs/1905.02780

  10. [10]

    Menda, K

    K. Menda, K. Driggs-Campbell, and M. J. Kochenderfer. Ensembledagger: A bayesian ap- proach to safe imitation learning, 2019. URLhttps://arxiv.org/abs/1807.08364

  11. [11]

    M. Zhao, R. Simmons, H. Admoni, A. Ramdas, and A. Bajcsy. Conformalized interactive imitation learning: Handling expert shift and intermittent feedback, 2025. URLhttps:// arxiv.org/abs/2410.08852

  12. [12]

    Wang and Y

    C. Wang and Y . Wang. Uncertainty-driven data aggregation for imitation learning in au- tonomous vehicles.Information, 15(6), 2024. ISSN 2078-2489. doi:10.3390/info15060336. URLhttps://www.mdpi.com/2078-2489/15/6/336

  13. [13]

    S.-W. Lee, X. Kang, and Y .-L. Kuo. Diff-dagger: Uncertainty estimation with diffusion policy for robotic manipulation, 2025. URLhttps://arxiv.org/abs/2410.14868

  14. [14]

    Lakshminarayanan, A

    B. Lakshminarayanan, A. Pritzel, and C. Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles, 2017. URLhttps://arxiv.org/abs/1612.01474

  15. [15]

    Gal and Z

    Y . Gal and Z. Ghahramani. Dropout as a bayesian approximation: Representing model uncer- tainty in deep learning. In M. F. Balcan and K. Q. Weinberger, editors,Proceedings of The 33rd International Conference on Machine Learning, volume 48 ofProceedings of Machine Learning Research, pages 1050–1059, New York, New York, USA, 20–22 Jun 2016. PMLR. URLhttps...

  16. [16]

    A. N. Angelopoulos and S. Bates. A gentle introduction to conformal prediction and distribution-free uncertainty quantification, 2022. URLhttps://arxiv.org/abs/2107. 07511

  17. [17]

    A. Z. Ren, A. Dixit, A. Bodrova, S. Singh, S. Tu, N. Brown, P. Xu, L. Takayama, F. Xia, J. Varley, Z. Xu, D. Sadigh, A. Zeng, and A. Majumdar. Robots that ask for help: Uncertainty alignment for large language model planners. InProceedings of the Conference on Robot Learning (CoRL), 2023

  18. [18]

    and Cohen, N

    M. McCloskey and N. J. Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. volume 24 ofPsychology of Learning and Motivation, pages 109–165. Academic Press, 1989. doi:https://doi.org/10.1016/S0079-7421(08)60536-8. URL https://www.sciencedirect.com/science/article/pii/S0079742108605368

  19. [19]

    Y . Luo, Z. Yang, F. Meng, Y . Li, J. Zhou, and Y . Zhang. An empirical study of catas- trophic forgetting in large language models during continual fine-tuning, 2025. URLhttps: //arxiv.org/abs/2308.08747. 10

  20. [20]

    Rebuffi, A

    S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert. icarl: Incremental classifier and representation learning, 2017. URLhttps://arxiv.org/abs/1611.07725

  21. [21]

    Lopez-Paz and M

    D. Lopez-Paz and M. A. Ranzato. Gradient episodic memory for continual learning. In I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Gar- nett, editors,Advances in Neural Information Processing Systems, volume 30. Curran As- sociates, Inc., 2017. URLhttps://proceedings.neurips.cc/paper_files/paper/ 2017/file/f87522788a2b...

  22. [22]

    Rusu and Kieran Milan and John Quan and Tiago Ramalho and Agnieszka Grabska-Barwinska and Demis Hassabis and Claudia Clopath and Dharshan Kumaran and Raia Hadsell , title =

    J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Mi- lan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell. Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13):3521–3526, Mar. 2017. ISSN 1091-6490. doi: 10.1073/pnas.1...

  23. [23]

    B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone. Libero: Benchmarking knowledge transfer for lifelong robot learning, 2023. URLhttps://arxiv.org/abs/2306. 03310. 11 A Compute Infrastructure All models were trained on an institutional high-performance computing cluster using NVIDIA H200 GPUs. Most evaluations were also run on the same clus...