RECALL: Recovery Experience Collection for Active Lifelong Learning in Vision-Language-Action Models

Tesca FItzgerald; Ulas Berk Karli

arxiv: 2606.23617 · v1 · pith:6S2VBIPCnew · submitted 2026-06-22 · 💻 cs.RO · cs.AI· cs.LG

RECALL: Recovery Experience Collection for Active Lifelong Learning in Vision-Language-Action Models

Ulas Berk Karli , Tesca Fitzgerald This is my paper

Pith reviewed 2026-06-26 08:12 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LG

keywords vision-language-action modelsactive learningcontinual learninguncertainty estimationcatastrophic forgettingimitation learningrobot policiesrecovery behaviors

0 comments

The pith

Active uncertainty-guided data collection improves fine-tuning efficiency for vision-language-action models but requires methods to avoid forgetting earlier skills.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that vision-language-action models improve more efficiently when new demonstrations are collected only in states where the current policy shows high uncertainty rather than after any failure or on a fixed schedule. A sympathetic reader would care because passive collection wastes human effort on already-mastered parts of a task and waits for mistakes to happen. The authors further show that training solely on these targeted recovery demonstrations causes the model to lose performance on previously learned behaviors. They therefore compare replay-based mixing of old and new data with elastic weight consolidation and document the resulting tradeoffs in plasticity versus retention. The overall result is an empirical case that uncertainty can guide more economical lifelong adaptation in large autoregressive robot policies.

Core claim

The central claim is that an active, uncertainty-guided paradigm for collecting recovery experiences in vision-language-action models enables more efficient adaptation than passive imitation learning, but requires continual learning methods such as replay-based data mixing or elastic weight consolidation to mitigate catastrophic forgetting of prior behaviors.

What carries the argument

Uncertainty-guided selection of states for recovery demonstrations, combined with replay-based data mixing or elastic weight consolidation to balance new learning against retention in autoregressive VLA policies.

If this is right

Uncertainty-guided collection requires fewer demonstrations to achieve equivalent fine-tuning gains than passive collection after failures.
Training exclusively on the new recovery data produces measurable drops in performance on tasks learned earlier.
Replay-based mixing of old demonstrations with new recovery data preserves prior behaviors while allowing adaptation.
Elastic weight consolidation offers an alternative way to retain old skills at the cost of slower incorporation of uncertainty-guided data.
Tradeoffs exist between the rate of adaptation to new recovery data and the degree of retention of previously learned behaviors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same uncertainty signal could be used to decide when a deployed robot should request human intervention rather than continuing to act.
The approach might generalize to other large policy classes where data collection cost is the main bottleneck.
Better calibration of uncertainty could reduce the risk that recovery data over-represents rare failure modes.
Longer-horizon tasks might require uncertainty estimates that account for future states rather than only the immediate one.

Load-bearing premise

Uncertainty estimates from the vision-language-action policy accurately identify states where new demonstrations will improve performance without introducing bias into the collected recovery data.

What would settle it

If experiments that replace uncertainty-guided state selection with random or failure-triggered collection show no reduction in the number of demonstrations needed to reach the same performance level, the efficiency advantage claim would be falsified.

Figures

Figures reproduced from arXiv: 2606.23617 by Tesca FItzgerald, Ulas Berk Karli.

**Figure 1.** Figure 1: Overview of uncertainty-guided active continual learning. An initial π0-FAST policy is rolled out on LIBERO-10 while INSIGHT identifies high-uncertainty states. We compare passive collection with online recovery from the first high-uncertainty state and offline recovery from all high-uncertainty states, then integrate the resulting data using replay, or EWC regularization. within a rollout. In contrast, an… view at source ↗

**Figure 2.** Figure 2: Online recovery collection improves over passive start-state collection. We compare fine-tuned policy performance after adding matched passive or active data. Active collection is guided by Strong INSIGHT (left) or Weak INSIGHT (right). Results [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Online recovery collection is as effective as offline collection despite using fewer demonstrations. Left: training curves for online and offline Strong INSIGHT recovery datasets mixed with prior demonstrations. Right: best-checkpoint comparison across overall, collected-task, and retained-task success. 5 [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: New-only fine-tuning causes catastrophic forgetting. We compare training on newonly data (solid lines) against replay-based training that combines old and new data (dashed lines). Results [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Low learning rates (α) and EWC reduce forgetting but limit adaptation to new data. Compared to full replay, low learning-rate fine-tuning and EWC better preserve retained-task performance than standard new-only fine-tuning, but produce smaller gains on collected tasks. but collected-task performance remains limited and overall success does not improve. Filtering the Fisher reference data changes the stabil… view at source ↗

**Figure 6.** Figure 6: Replay coverage controls the tradeoff between adaptation and retention. Left: results for Strong INSIGHT online recovery data mixed with different replay subsets. Right: bestcheckpoint comparison. 7 [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Regularization does not consistently prevent degradation, indicating that replay coverage is critical. We evaluate EWC and filtered-EWC variants when using targeted replay data. 10 Summary and Conclusion We studied active continual learning for autoregressive VLAs by using INSIGHT predictions to identify high-uncertainty, states and collect recovery demonstrations from those states. Across experiments, fi… view at source ↗

**Figure 8.** Figure 8: shows the Strong INSIGHT active-versus-passive comparison with old normalization statistics. Compared to the main new-normalization result in [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Weak INSIGHT active-versus-passive comparison with old normalization statistics. This comparison complements the Strong INSIGHT old-normalization ablation and shows that old normalization statistics similarly constrain adaptation under the Weak INSIGHT recovery dataset [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: Full EWC coefficient sweep with low learning rate. Stronger regularization stabilizes retained-task performance, but does not provide enough plasticity to fully exploit uncertainty-guided recovery data. C Additional EWC Sweeps [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

**Figure 11.** Figure 11: Filtered Fisher EWC sweep. Computing Fisher information on filtered prior data changes the stability-plasticity behavior, but regularization alone still does not match replay-based adaptation [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗

read the original abstract

Vision-Language-Action (VLA) models are commonly fine-tuned through passive imitation learning, where additional demonstrations are collected for tasks where the policy performs poorly. This approach incurs several downsides: it requires the robot to fail before data collection is triggered, provides little guidance about which states require supervision, and wastes demonstrator effort on redundant parts of the task where the policy already performs well. In this paper, we propose an active, continual learning paradigm for VLAs. We demonstrate that active, uncertainty-guided data collection leads to more efficient fine-tuning than when using passively-collected demonstrations. However, we also find that fine-tuning only on actively-collected recovery data leads to catastrophic forgetting. We evaluate techniques for continual learning, including replay-based data mixing and elastic weight consolidation, and identify tradeoffs between plasticity to uncertainty-guided recovery data and retention of previously learned behaviors. Overall, our work contributes an empirical study of active continual learning for autoregressive VLAs, establishing that uncertainty-guided recovery demonstrations can improve adaptation efficiency while also revealing open challenges when targeted new data is incorporated into large robot policies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Active uncertainty-guided recovery collection improves VLA fine-tuning efficiency over passive demos but triggers forgetting that replay and EWC only partially address.

read the letter

The main point is that switching to uncertainty-guided active collection of recovery data makes fine-tuning VLAs more efficient than waiting for failures and collecting passive demonstrations, but it also produces catastrophic forgetting of earlier behaviors. The authors then compare replay-based mixing and elastic weight consolidation as ways to keep plasticity without losing prior skills.

The work applies active and continual learning ideas to autoregressive VLAs in a targeted way. It spells out the concrete drawbacks of passive collection—failure-triggered demos, no state guidance, and redundant effort on easy segments—and shows the resulting efficiency-forgetting tradeoff in this model class.

The soft spots are clear from the abstract. No description of the uncertainty estimator appears, no quantitative results or controls are given, and there is no ablation linking the selected states to downstream gains rather than just harder regions. That leaves the central efficiency claim hard to evaluate and makes the stress-test concern about validation of the uncertainty signal a real one.

This is aimed at researchers already working on data-efficient adaptation and lifelong learning for large robot policies. Someone in that niche could extract the practical tradeoff analysis, but the paper does not introduce new mechanisms or first-principles results.

It deserves peer review because the problem is practical and the empirical framing is direct, provided the full manuscript supplies the missing method details and numbers.

Referee Report

3 major / 1 minor

Summary. The paper proposes RECALL, an active continual learning approach for Vision-Language-Action (VLA) models. It argues that passive imitation learning requires failures before data collection and wastes effort on well-learned states; instead, uncertainty-guided active collection of recovery demonstrations improves fine-tuning efficiency. The work also reports that fine-tuning solely on such recovery data induces catastrophic forgetting and evaluates mitigation via replay-based mixing and elastic weight consolidation, identifying plasticity-retention tradeoffs in an empirical study of autoregressive VLAs.

Significance. If the empirical claims hold with proper quantification and validation, the work would be a useful empirical contribution to lifelong robot learning by showing how targeted active recovery data can reduce demonstrator effort compared with passive collection and by surfacing concrete tradeoffs when incorporating such data into large VLAs. The paper does not claim new theoretical results, parameter-free derivations, or machine-checked proofs.

major comments (3)

[Abstract] Abstract: the headline claim that 'active, uncertainty-guided data collection leads to more efficient fine-tuning' is presented without any quantitative metrics, baselines, error bars, dataset sizes, or statistical controls. This is load-bearing for the central empirical result; the abstract states an outcome but supplies no numbers that would allow verification of the efficiency advantage.
[Method] Method (uncertainty estimator): the paper does not define or ablate the uncertainty signal used to select recovery states (token entropy, predictive variance, ensemble disagreement, etc.) nor demonstrate that states flagged by this signal causally produce net performance gains rather than distribution shift or bias. This link is required for the weakest assumption identified in the stress-test note and remains unaddressed in the provided text.
[Experiments] Experiments: no tables or figures are referenced that report success rates, sample efficiency curves, or forgetting metrics with controls for the passive vs. active comparison; without these, the reported tradeoffs between plasticity and retention cannot be assessed.

minor comments (1)

[Abstract] The abstract uses the term 'catastrophic forgetting' without a precise operational definition or reference to the metric employed.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim that 'active, uncertainty-guided data collection leads to more efficient fine-tuning' is presented without any quantitative metrics, baselines, error bars, dataset sizes, or statistical controls. This is load-bearing for the central empirical result; the abstract states an outcome but supplies no numbers that would allow verification of the efficiency advantage.

Authors: We agree that the abstract would be strengthened by including quantitative support for the efficiency claim. The experimental results in the paper quantify the gains from active collection relative to passive baselines. We will revise the abstract to incorporate these key metrics, baselines, error bars, and dataset details to enable verification. revision: yes
Referee: [Method] Method (uncertainty estimator): the paper does not define or ablate the uncertainty signal used to select recovery states (token entropy, predictive variance, ensemble disagreement, etc.) nor demonstrate that states flagged by this signal causally produce net performance gains rather than distribution shift or bias. This link is required for the weakest assumption identified in the stress-test note and remains unaddressed in the provided text.

Authors: The current manuscript describes uncertainty-guided selection at a high level but does not provide an explicit definition of the signal or an ablation. We will expand the method section to define the uncertainty estimator in detail and add an ablation comparing it against random selection to demonstrate net gains while controlling for distribution shift. revision: yes
Referee: [Experiments] Experiments: no tables or figures are referenced that report success rates, sample efficiency curves, or forgetting metrics with controls for the passive vs. active comparison; without these, the reported tradeoffs between plasticity and retention cannot be assessed.

Authors: The paper contains figures and tables presenting success rates, sample efficiency curves, and forgetting metrics for the passive versus active comparisons. We will add explicit in-text references to these results when discussing the plasticity-retention tradeoffs to improve clarity and verifiability. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical study with independent experimental comparisons

full rationale

The paper is an empirical study of active vs. passive data collection for VLA fine-tuning. It reports experimental outcomes on efficiency and forgetting without any equations, parameter fits presented as predictions, self-definitional constructs, or load-bearing self-citations. The central claim rests on direct comparisons of collection strategies and continual-learning techniques, which are externally falsifiable via the described robot evaluations and do not reduce to the inputs by construction. No derivation chain exists that could exhibit the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on abstract; no free parameters, axioms, or invented entities are identifiable or extractable.

pith-pipeline@v0.9.1-grok · 5728 in / 999 out tokens · 20140 ms · 2026-06-26T08:12:19.634592+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 4 canonical work pages

[1]

Brohan, N

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Haus- man, A. Herzog, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, T. Jackson, S. Jesmonth, N. J. Joshi, R. Julian, D. Kalashnikov, Y . Kuang, I. Leal, K.-H. Lee, S. Levine, Y . Lu, U. Malla, D. Man- junath, I. Mordatch, O. Nachum, C. Parada, J. Peralta, E. Perez, K. Pertsc...

Pith/arXiv arXiv 2023
[2]

Brohan, N

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn, P. Florence, C. Fu, M. G. Arenas, K. Gopalakrishnan, K. Han, K. Hausman, A. Herzog, J. Hsu, B. Ichter, A. Irpan, N. Joshi, R. Julian, D. Kalashnikov, Y . Kuang, I. Leal, L. Lee, T.-W. E. Lee, S. Levine, Y . Lu, H. Michalewski, I. Mordatch, K. Pe...

Pith/arXiv arXiv 2023
[3]

G. R. Team, S. Abeyruwan, J. Ainslie, J.-B. Alayrac, M. G. Arenas, T. Armstrong, A. Balakr- ishna, R. Baruch, M. Bauza, M. Blokzijl, S. Bohez, K. Bousmalis, A. Brohan, T. Buschmann, A. Byravan, S. Cabi, K. Caluwaerts, F. Casarini, O. Chang, J. E. Chen, X. Chen, H.-T. L. Chi- ang, K. Choromanski, D. D’Ambrosio, S. Dasari, T. Davchev, C. Devin, N. D. Palo, ...

Pith/arXiv arXiv 2025
[4]

Pertsch, K

K. Pertsch, K. Stachowicz, B. Ichter, D. Driess, S. Nair, Q. Vuong, O. Mees, C. Finn, and S. Levine. Fast: Efficient action tokenization for vision-language-action models, 2025. URL https://arxiv.org/abs/2501.09747

Pith/arXiv arXiv 2025
[5]

Intelligence, K

P. Intelligence, K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, M. Y . Galliker, D. Ghosh, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, D. LeBlanc, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, A. Z. Ren, L. X. Shi, L. Smith, J. T. Springenberg, K. Stachowicz, J. Tanner, Q. V...

Pith/arXiv arXiv 2025
[6]

U. B. Karli, Z. Shangguan, and T. FItzgerald. Insight: Inference-time sequence introspection for generating help triggers in vision-language-action models, 2025. URLhttps://arxiv. org/abs/2510.01389

Pith/arXiv arXiv 2025
[7]

Ross and D

S. Ross and D. Bagnell. Efficient reductions for imitation learning. In Y . W. Teh and M. Titterington, editors,Proceedings of the Thirteenth International Conference on Artifi- cial Intelligence and Statistics, volume 9 ofProceedings of Machine Learning Research, pages 661–668, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR. URLhttps: //procee...

2010
[8]

Kelly, C

M. Kelly, C. Sidrane, K. Driggs-Campbell, and M. J. Kochenderfer. Hg-dagger: Interactive imitation learning with human experts. In2019 International Conference on Robotics and Automation (ICRA), page 8077–8083. IEEE Press, 2019. doi:10.1109/ICRA.2019.8793698. URLhttps://doi.org/10.1109/ICRA.2019.8793698

work page doi:10.1109/icra.2019.8793698 2019
[9]

Y . Cui, D. Isele, S. Niekum, and K. Fujimura. Uncertainty-aware data aggregation for deep imitation learning, 2019. URLhttps://arxiv.org/abs/1905.02780

Pith/arXiv arXiv 2019
[10]

Menda, K

K. Menda, K. Driggs-Campbell, and M. J. Kochenderfer. Ensembledagger: A bayesian ap- proach to safe imitation learning, 2019. URLhttps://arxiv.org/abs/1807.08364

Pith/arXiv arXiv 2019
[11]

M. Zhao, R. Simmons, H. Admoni, A. Ramdas, and A. Bajcsy. Conformalized interactive imitation learning: Handling expert shift and intermittent feedback, 2025. URLhttps:// arxiv.org/abs/2410.08852

arXiv 2025
[12]

Wang and Y

C. Wang and Y . Wang. Uncertainty-driven data aggregation for imitation learning in au- tonomous vehicles.Information, 15(6), 2024. ISSN 2078-2489. doi:10.3390/info15060336. URLhttps://www.mdpi.com/2078-2489/15/6/336

work page doi:10.3390/info15060336 2024
[13]

S.-W. Lee, X. Kang, and Y .-L. Kuo. Diff-dagger: Uncertainty estimation with diffusion policy for robotic manipulation, 2025. URLhttps://arxiv.org/abs/2410.14868

arXiv 2025
[14]

Lakshminarayanan, A

B. Lakshminarayanan, A. Pritzel, and C. Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles, 2017. URLhttps://arxiv.org/abs/1612.01474

Pith/arXiv arXiv 2017
[15]

Gal and Z

Y . Gal and Z. Ghahramani. Dropout as a bayesian approximation: Representing model uncer- tainty in deep learning. In M. F. Balcan and K. Q. Weinberger, editors,Proceedings of The 33rd International Conference on Machine Learning, volume 48 ofProceedings of Machine Learning Research, pages 1050–1059, New York, New York, USA, 20–22 Jun 2016. PMLR. URLhttps...

2016
[16]

A. N. Angelopoulos and S. Bates. A gentle introduction to conformal prediction and distribution-free uncertainty quantification, 2022. URLhttps://arxiv.org/abs/2107. 07511

2022
[17]

A. Z. Ren, A. Dixit, A. Bodrova, S. Singh, S. Tu, N. Brown, P. Xu, L. Takayama, F. Xia, J. Varley, Z. Xu, D. Sadigh, A. Zeng, and A. Majumdar. Robots that ask for help: Uncertainty alignment for large language model planners. InProceedings of the Conference on Robot Learning (CoRL), 2023

2023
[18]

and Cohen, N

M. McCloskey and N. J. Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. volume 24 ofPsychology of Learning and Motivation, pages 109–165. Academic Press, 1989. doi:https://doi.org/10.1016/S0079-7421(08)60536-8. URL https://www.sciencedirect.com/science/article/pii/S0079742108605368

work page doi:10.1016/s0079-7421(08)60536-8 1989
[19]

Y . Luo, Z. Yang, F. Meng, Y . Li, J. Zhou, and Y . Zhang. An empirical study of catas- trophic forgetting in large language models during continual fine-tuning, 2025. URLhttps: //arxiv.org/abs/2308.08747. 10

Pith/arXiv arXiv 2025
[20]

Rebuffi, A

S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert. icarl: Incremental classifier and representation learning, 2017. URLhttps://arxiv.org/abs/1611.07725

Pith/arXiv arXiv 2017
[21]

Lopez-Paz and M

D. Lopez-Paz and M. A. Ranzato. Gradient episodic memory for continual learning. In I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Gar- nett, editors,Advances in Neural Information Processing Systems, volume 30. Curran As- sociates, Inc., 2017. URLhttps://proceedings.neurips.cc/paper_files/paper/ 2017/file/f87522788a2b...

2017
[22]

Rusu and Kieran Milan and John Quan and Tiago Ramalho and Agnieszka Grabska-Barwinska and Demis Hassabis and Claudia Clopath and Dharshan Kumaran and Raia Hadsell , title =

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Mi- lan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell. Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13):3521–3526, Mar. 2017. ISSN 1091-6490. doi: 10.1073/pnas.1...

work page doi:10.1073/pnas.1611835114 2017
[23]

B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone. Libero: Benchmarking knowledge transfer for lifelong robot learning, 2023. URLhttps://arxiv.org/abs/2306. 03310. 11 A Compute Infrastructure All models were trained on an institutional high-performance computing cluster using NVIDIA H200 GPUs. Most evaluations were also run on the same clus...

2023

[1] [1]

Brohan, N

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Haus- man, A. Herzog, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, T. Jackson, S. Jesmonth, N. J. Joshi, R. Julian, D. Kalashnikov, Y . Kuang, I. Leal, K.-H. Lee, S. Levine, Y . Lu, U. Malla, D. Man- junath, I. Mordatch, O. Nachum, C. Parada, J. Peralta, E. Perez, K. Pertsc...

Pith/arXiv arXiv 2023

[2] [2]

Brohan, N

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn, P. Florence, C. Fu, M. G. Arenas, K. Gopalakrishnan, K. Han, K. Hausman, A. Herzog, J. Hsu, B. Ichter, A. Irpan, N. Joshi, R. Julian, D. Kalashnikov, Y . Kuang, I. Leal, L. Lee, T.-W. E. Lee, S. Levine, Y . Lu, H. Michalewski, I. Mordatch, K. Pe...

Pith/arXiv arXiv 2023

[3] [3]

G. R. Team, S. Abeyruwan, J. Ainslie, J.-B. Alayrac, M. G. Arenas, T. Armstrong, A. Balakr- ishna, R. Baruch, M. Bauza, M. Blokzijl, S. Bohez, K. Bousmalis, A. Brohan, T. Buschmann, A. Byravan, S. Cabi, K. Caluwaerts, F. Casarini, O. Chang, J. E. Chen, X. Chen, H.-T. L. Chi- ang, K. Choromanski, D. D’Ambrosio, S. Dasari, T. Davchev, C. Devin, N. D. Palo, ...

Pith/arXiv arXiv 2025

[4] [4]

Pertsch, K

K. Pertsch, K. Stachowicz, B. Ichter, D. Driess, S. Nair, Q. Vuong, O. Mees, C. Finn, and S. Levine. Fast: Efficient action tokenization for vision-language-action models, 2025. URL https://arxiv.org/abs/2501.09747

Pith/arXiv arXiv 2025

[5] [5]

Intelligence, K

P. Intelligence, K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, M. Y . Galliker, D. Ghosh, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, D. LeBlanc, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, A. Z. Ren, L. X. Shi, L. Smith, J. T. Springenberg, K. Stachowicz, J. Tanner, Q. V...

Pith/arXiv arXiv 2025

[6] [6]

U. B. Karli, Z. Shangguan, and T. FItzgerald. Insight: Inference-time sequence introspection for generating help triggers in vision-language-action models, 2025. URLhttps://arxiv. org/abs/2510.01389

Pith/arXiv arXiv 2025

[7] [7]

Ross and D

S. Ross and D. Bagnell. Efficient reductions for imitation learning. In Y . W. Teh and M. Titterington, editors,Proceedings of the Thirteenth International Conference on Artifi- cial Intelligence and Statistics, volume 9 ofProceedings of Machine Learning Research, pages 661–668, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR. URLhttps: //procee...

2010

[8] [8]

Kelly, C

M. Kelly, C. Sidrane, K. Driggs-Campbell, and M. J. Kochenderfer. Hg-dagger: Interactive imitation learning with human experts. In2019 International Conference on Robotics and Automation (ICRA), page 8077–8083. IEEE Press, 2019. doi:10.1109/ICRA.2019.8793698. URLhttps://doi.org/10.1109/ICRA.2019.8793698

work page doi:10.1109/icra.2019.8793698 2019

[9] [9]

Y . Cui, D. Isele, S. Niekum, and K. Fujimura. Uncertainty-aware data aggregation for deep imitation learning, 2019. URLhttps://arxiv.org/abs/1905.02780

Pith/arXiv arXiv 2019

[10] [10]

Menda, K

K. Menda, K. Driggs-Campbell, and M. J. Kochenderfer. Ensembledagger: A bayesian ap- proach to safe imitation learning, 2019. URLhttps://arxiv.org/abs/1807.08364

Pith/arXiv arXiv 2019

[11] [11]

M. Zhao, R. Simmons, H. Admoni, A. Ramdas, and A. Bajcsy. Conformalized interactive imitation learning: Handling expert shift and intermittent feedback, 2025. URLhttps:// arxiv.org/abs/2410.08852

arXiv 2025

[12] [12]

Wang and Y

C. Wang and Y . Wang. Uncertainty-driven data aggregation for imitation learning in au- tonomous vehicles.Information, 15(6), 2024. ISSN 2078-2489. doi:10.3390/info15060336. URLhttps://www.mdpi.com/2078-2489/15/6/336

work page doi:10.3390/info15060336 2024

[13] [13]

S.-W. Lee, X. Kang, and Y .-L. Kuo. Diff-dagger: Uncertainty estimation with diffusion policy for robotic manipulation, 2025. URLhttps://arxiv.org/abs/2410.14868

arXiv 2025

[14] [14]

Lakshminarayanan, A

B. Lakshminarayanan, A. Pritzel, and C. Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles, 2017. URLhttps://arxiv.org/abs/1612.01474

Pith/arXiv arXiv 2017

[15] [15]

Gal and Z

Y . Gal and Z. Ghahramani. Dropout as a bayesian approximation: Representing model uncer- tainty in deep learning. In M. F. Balcan and K. Q. Weinberger, editors,Proceedings of The 33rd International Conference on Machine Learning, volume 48 ofProceedings of Machine Learning Research, pages 1050–1059, New York, New York, USA, 20–22 Jun 2016. PMLR. URLhttps...

2016

[16] [16]

A. N. Angelopoulos and S. Bates. A gentle introduction to conformal prediction and distribution-free uncertainty quantification, 2022. URLhttps://arxiv.org/abs/2107. 07511

2022

[17] [17]

A. Z. Ren, A. Dixit, A. Bodrova, S. Singh, S. Tu, N. Brown, P. Xu, L. Takayama, F. Xia, J. Varley, Z. Xu, D. Sadigh, A. Zeng, and A. Majumdar. Robots that ask for help: Uncertainty alignment for large language model planners. InProceedings of the Conference on Robot Learning (CoRL), 2023

2023

[18] [18]

and Cohen, N

M. McCloskey and N. J. Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. volume 24 ofPsychology of Learning and Motivation, pages 109–165. Academic Press, 1989. doi:https://doi.org/10.1016/S0079-7421(08)60536-8. URL https://www.sciencedirect.com/science/article/pii/S0079742108605368

work page doi:10.1016/s0079-7421(08)60536-8 1989

[19] [19]

Y . Luo, Z. Yang, F. Meng, Y . Li, J. Zhou, and Y . Zhang. An empirical study of catas- trophic forgetting in large language models during continual fine-tuning, 2025. URLhttps: //arxiv.org/abs/2308.08747. 10

Pith/arXiv arXiv 2025

[20] [20]

Rebuffi, A

S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert. icarl: Incremental classifier and representation learning, 2017. URLhttps://arxiv.org/abs/1611.07725

Pith/arXiv arXiv 2017

[21] [21]

Lopez-Paz and M

D. Lopez-Paz and M. A. Ranzato. Gradient episodic memory for continual learning. In I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Gar- nett, editors,Advances in Neural Information Processing Systems, volume 30. Curran As- sociates, Inc., 2017. URLhttps://proceedings.neurips.cc/paper_files/paper/ 2017/file/f87522788a2b...

2017

[22] [22]

Rusu and Kieran Milan and John Quan and Tiago Ramalho and Agnieszka Grabska-Barwinska and Demis Hassabis and Claudia Clopath and Dharshan Kumaran and Raia Hadsell , title =

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Mi- lan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell. Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13):3521–3526, Mar. 2017. ISSN 1091-6490. doi: 10.1073/pnas.1...

work page doi:10.1073/pnas.1611835114 2017

[23] [23]

B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone. Libero: Benchmarking knowledge transfer for lifelong robot learning, 2023. URLhttps://arxiv.org/abs/2306. 03310. 11 A Compute Infrastructure All models were trained on an institutional high-performance computing cluster using NVIDIA H200 GPUs. Most evaluations were also run on the same clus...

2023