pith. machine review for the scientific record. sign in

arxiv: 2604.18107 · v1 · submitted 2026-04-20 · 💻 cs.CV

Recognition: unknown

Test-Time Perturbation Learning with Delayed Feedback for Vision-Language-Action Models

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:51 UTC · model grok-4.3

classification 💻 cs.CV
keywords vision-language-action modelstest-time adaptationperturbation learningdelayed feedbacktrajectory overfittingaction logitsmultimodal decision makingrobustness to shifts
0
0 comments X

The pith

PDF improves Vision-Language-Action models at test time by using delayed feedback to adjust action predictions and reduce overfitting to spurious correlations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Vision-Language-Action models often fail when objects shift slightly because they have memorized specific action sequences tied to training trajectories. The paper introduces PDF as a test-time method that avoids fine-tuning the base model or using an external verifier. It applies uncertainty-based data augmentation, aggregates actions by voting, and uses an adaptive scheduler to control the amount of extra computation. A small perturbation module then learns to revise the model's output scores once delayed outcome feedback arrives, which reduces overconfident errors. Experiments report higher success on manipulation benchmarks and game environments, indicating a practical route to more stable sequential decision agents.

Core claim

PDF is a verifier-free test-time adaptation framework that mitigates trajectory overfitting in frozen Vision-Language-Action models through uncertainty-based data augmentation combined with action voting and an adaptive budget scheduler, while a lightweight perturbation module retrospectively corrects action logits using delayed feedback signals to improve decision stability.

What carries the argument

The lightweight perturbation module that learns to adjust the base model's action logits retrospectively from delayed feedback.

If this is right

  • Vision-Language-Action models reach higher task success without retraining the original weights.
  • The same gains appear across both robotic manipulation and game-playing domains.
  • An adaptive scheduler keeps the added computation from growing unbounded during long episodes.
  • The approach works without a separate verifier or ground-truth labels at test time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The delayed-feedback correction could be useful in real-world settings where action outcomes are observed only after a delay.
  • Similar retrospective adjustment might help other sequence models that suffer from training-trajectory bias.
  • Accurate uncertainty estimates are required for the augmentation step to target the right predictions.

Load-bearing premise

Trajectory overfitting to spurious action-entity correlations is the dominant source of fragility to environmental shifts, and uncertainty augmentation plus delayed-feedback correction can fix it without introducing new instabilities.

What would settle it

Running PDF on the LIBERO or Atari benchmarks and observing no gain or a drop in success rate relative to the vanilla Vision-Language-Action model on tasks that include small object-pose changes would falsify the claimed benefit.

Figures

Figures reproduced from arXiv: 2604.18107 by Fuchun Sun, Jiahuan Zhou, Jiangmeng Li, Lixiang Lium, Xiao Xu, Xi Wang, Zehua Zang.

Figure 1
Figure 1. Figure 1: Evidence of trajectory overfitting and the effectiveness [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison between traditional self-supervised test [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The overall framework of our proposed PDF. At test time, the VLA receives pixel observation [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Human normalized score changes across 57 Atari games. Blue bars show performance improvements, orange bars indicate [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Performance degradation under increasing data augmen [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Performance comparison across five benchmarks shows [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visual comparison of OpenVLA and PDF on three tasks. The green thumb indicates that the agent performed the correct action, [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
read the original abstract

Vision-Language-Action models (VLAs) achieve remarkable performance in sequential decision-making but remain fragile to subtle environmental shifts, such as small changes in object pose. We attribute this brittleness to trajectory overfitting, where VLAs over-attend to the spurious correlation between actions and entities, then reproduce memorized action patterns. We propose Perturbation learning with Delayed Feedback (PDF), a verifier-free test-time adaptation framework that improves decision performance without fine-tuning the base model. PDF mitigates the spurious correlation through uncertainty-based data augmentation and action voting, while an adaptive scheduler allocates augmentation budgets to balance performance and efficiency. To further improve stability, PDF learns a lightweight perturbation module that retrospectively adjusts action logits guided by delayed feedback, correcting overconfidence issue. Experiments on LIBERO (+7.4\% success rate) and Atari (+10.3 human normalized score) demonstrate consistent gains of PDF in task success over vanilla VLA and VLA with test-time adaptation, establishing a practical path toward reliable test-time adaptation in multimodal decision-making agents. The code is available at \href{https://github.com/zhoujiahuan1991/CVPR2026-PDF}{https://github.com/zhoujiahuan1991/CVPR2026-PDF}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper proposes Perturbation learning with Delayed Feedback (PDF), a verifier-free test-time adaptation framework for Vision-Language-Action (VLA) models. It attributes model brittleness to trajectory overfitting that produces spurious correlations and overconfident actions. PDF combines uncertainty-based data augmentation with action voting, an adaptive scheduler to allocate augmentation budgets, and a lightweight perturbation module trained retrospectively on delayed feedback to adjust action logits. Experiments report gains of +7.4% success rate on LIBERO and +10.3 human-normalized score on Atari over vanilla VLA and other test-time adaptation baselines, with public code released.

Significance. If the empirical results hold under rigorous validation, the work provides a practical, training-free route to improving robustness of multimodal decision-making agents. The public code release is a clear strength that supports reproducibility and follow-on research in test-time adaptation.

major comments (3)
  1. [Experiments] Experiments section: the central performance claims (+7.4% success rate on LIBERO, +10.3 HNS on Atari) are reported without error bars, number of evaluation seeds, ablation tables, or statistical significance tests. This directly undermines assessment of whether the gains are reliable and load-bearing for the paper's empirical contribution.
  2. [Introduction and §3] Introduction and §3 (Method): the attribution of brittleness primarily to trajectory overfitting is presented as the motivating assumption, yet no direct evidence, diagnostic experiments, or comparison against alternative failure modes (e.g., visual encoder limitations or policy architecture) is provided to establish that this is the dominant factor.
  3. [§3.3] §3.3 (Perturbation module): the description of how delayed feedback is used to train the lightweight module and whether it requires ground-truth signals at test time is insufficient to verify the verifier-free claim and to confirm that the module does not introduce new instabilities.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'correcting overconfidence issue' should read 'correcting the overconfidence issue' for grammatical consistency.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments on our manuscript. We address each of the major comments below and outline the revisions we plan to make.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: the central performance claims (+7.4% success rate on LIBERO, +10.3 HNS on Atari) are reported without error bars, number of evaluation seeds, ablation tables, or statistical significance tests. This directly undermines assessment of whether the gains are reliable and load-bearing for the paper's empirical contribution.

    Authors: We agree that the absence of error bars, details on the number of evaluation seeds, ablation tables, and statistical significance tests weakens the empirical claims. In the revised manuscript, we will include results averaged over multiple random seeds (at least 5), report standard deviations as error bars, expand the ablation studies, and include statistical significance tests (e.g., paired t-tests) to demonstrate that the reported improvements are reliable. We will also make the evaluation protocol clearer. revision: yes

  2. Referee: [Introduction and §3] Introduction and §3 (Method): the attribution of brittleness primarily to trajectory overfitting is presented as the motivating assumption, yet no direct evidence, diagnostic experiments, or comparison against alternative failure modes (e.g., visual encoder limitations or policy architecture) is provided to establish that this is the dominant factor.

    Authors: The attribution to trajectory overfitting stems from our empirical observations that VLAs often replicate training trajectories under minor environmental changes, leading to spurious correlations. However, we acknowledge the lack of direct diagnostic evidence in the current manuscript. We will add diagnostic experiments in the revised version, including attention visualization, controlled perturbation tests, and comparisons to isolate trajectory overfitting from other potential issues such as visual encoder limitations. This will provide stronger support for our motivating assumption. revision: yes

  3. Referee: [§3.3] §3.3 (Perturbation module): the description of how delayed feedback is used to train the lightweight module and whether it requires ground-truth signals at test time is insufficient to verify the verifier-free claim and to confirm that the module does not introduce new instabilities.

    Authors: We apologize for the lack of clarity in §3.3. The lightweight perturbation module is trained retrospectively using delayed feedback obtained from the environment after action execution (e.g., task completion signals or reward signals), without any ground-truth action labels or external verifiers. This preserves the verifier-free property as it relies solely on standard environment interactions. We will revise the section to include a detailed explanation, pseudocode for the training process, and an analysis of potential instabilities with corresponding mitigation strategies to ensure stability. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical test-time adaptation method (PDF) for VLAs that combines uncertainty-based augmentation, action voting, an adaptive scheduler, and a lightweight perturbation module trained on delayed feedback. No equations, derivations, or parameter-fitting steps are presented that reduce the reported performance gains to quantities defined by construction from the method's own inputs. The central claims rest on experimental results from LIBERO and Atari rather than self-referential definitions, fitted-input predictions, or load-bearing self-citations. The approach is self-contained as an engineering contribution with public code.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that trajectory overfitting causes brittleness and that the proposed perturbation and feedback mechanisms can correct it; no free parameters or invented entities are explicitly quantified in the abstract.

axioms (1)
  • domain assumption VLAs achieve performance but remain fragile to subtle environmental shifts due to trajectory overfitting and spurious correlations between actions and entities.
    Directly stated in the opening of the abstract as the motivation for the work.

pith-pipeline@v0.9.0 · 5539 in / 1242 out tokens · 42982 ms · 2026-05-10T05:51:28.258084+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 12 canonical work pages · 5 internal anchors

  1. [1]

    Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling

    Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents.J. Artif. Intell. Res., 47:253– 279, 2013

  2. [2]

    Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Lucy Xiaoyang Shi, James Tanner, Quan Vuong, Anna Walling, Haohuan Wang, and Ury Zhilinsky.π0: A visio...

  3. [3]

    Dokania, Philip H

    Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet K. Dokania, Philip H. S. Torr, and Marc’Aurelio Ranzato. On tiny episodic memories in continual learning, 2019

  4. [4]

    arXiv preprint arXiv:2506.08440 (2025)

    Zengjue Chen, Runliang Niu, He Kong, and Qi Wang. TGRPO :fine-tuning vision-language-action model via trajectory-wise group relative policy optimization.CoRR, abs/2506.08440, 2025

  5. [5]

    Diffusion policy: Visuomotor policy learning via action dif- fusion.Int

    Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action dif- fusion.Int. J. Robotics Res., 44(10-11):1684–1704, 2025

  6. [6]

    LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

    Senyu Fei, Siyin Wang, Junhao Shi, Zihao Dai, Jikun Cai, Pengfang Qian, Li Ji, Xinzhe He, Shiduo Zhang, Zhaoye Fei, et al. Libero-plus: In-depth robustness analysis of vision- language-action models.arXiv preprint arXiv:2510.13626, 2025

  7. [7]

    Jack of all trades, mas- ter of some, a multi-purpose transformer agent.CoRR, abs/2402.09844, 2024

    Quentin Gallou ´edec, Edward Beeching, Cl ´ement Romac, and Emmanuel Dellandr ´ea. Jack of all trades, mas- ter of some, a multi-purpose transformer agent.CoRR, abs/2402.09844, 2024

  8. [8]

    Sanketi, Dorsa Sadigh, Chelsea Finn, and Sergey Levine

    Dibya Ghosh, Homer Rich Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Quan Vuong, Ted Xiao, Pannag R. Sanketi, Dorsa Sadigh, Chelsea Finn, and Sergey Levine. Octo: An open-source generalist robot policy. InRobotics: Science and Systems XX, Delft, The ...

  9. [9]

    An embodied generalist agent in 3d world

    Jiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, and Siyuan Huang. An embodied generalist agent in 3d world. InForty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27,

  10. [10]

    OpenReview.net, 2024

  11. [11]

    Verifier-free test-time sampling for vision language action models.arXiv preprint arXiv:2510.05681, 2025

    Suhyeok Jang, Dongyoung Kim, Changyeon Kim, Young- suk Kim, and Jinwoo Shin. Verifier-free test-time sampling for vision language action models.CoRR, abs/2510.05681, 2025

  12. [12]

    Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn

    Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Paul Foster, Pannag R. Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. Openvla: An open-source vision-language-action model. InConference on Robot Learning, 6-9 ...

  13. [13]

    Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

    Moo Jin Kim, Chelsea Finn, and Percy Liang. Fine-tuning vision-language-action models: Optimizing speed and suc- cess.CoRR, abs/2502.19645, 2025

  14. [14]

    Test-time adaptation for online vision-language navigation with feedback-based reinforcement learning

    Sungjune Kim, Gyeongrok Oh, Heeju Ko, Daehyun Ji, Dongwook Lee, Byung-Jun Lee, Sujin Jang, and Sang- pil Kim. Test-time adaptation for online vision-language navigation with feedback-based reinforcement learning. In Forty-second International Conference on Machine Learn- ing, 2025

  15. [15]

    Robomonkey: Scaling test-time sampling and ver- ification for vision-language-action models.arXiv preprint arXiv:2506.17811, 2025

    Jacky Kwok, Christopher Agia, Rohan Sinha, Matthew Fout- ter, Shulu Li, Ion Stoica, Azalia Mirhoseini, and Marco Pavone. Robomonkey: Scaling test-time sampling and verification for vision-language-action models.CoRR, abs/2506.17811, 2025

  16. [16]

    Test-time adaptation with binary feedback.CoRR, abs/2505.18514, 2025

    Taeckyung Lee, Sorn Chottananurak, Junsu Kim, Jinwoo Shin, Taesik Gong, and Sung-Ju Lee. Test-time adaptation with binary feedback.CoRR, abs/2505.18514, 2025

  17. [17]

    Metavla: Unified meta co-training for efficient embodied adaption.CoRR, abs/2510.05580, 2025

    Chen Li, Zhantao Yang, Han Zhang, Fangyi Chen, Chenchen Zhu, Anudeepsekhar Bolimera, and Marios Savvides. Metavla: Unified meta co-training for efficient embodied adaption.CoRR, abs/2510.05580, 2025

  18. [18]

    SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

    Haozhan Li, Yuxin Zuo, Jiale Yu, Yuhao Zhang, Zhaohui Yang, Kaiyan Zhang, Xuekai Zhu, Yuchen Zhang, Tianxing Chen, Ganqu Cui, Dehui Wang, Dingxiang Luo, Yuchen Fan, Youbang Sun, Jia Zeng, Jiangmiao Pang, Shanghang Zhang, Yu Wang, Yao Mu, Bowen Zhou, and Ning Ding. Simplevla- rl: Scaling VLA training via reinforcement learning.CoRR, abs/2509.09674, 2025

  19. [19]

    JARVIS-VLA: post-training large-scale vision language models to play visual games with keyboards and mouse

    Muyao Li, Zihao Wang, Kaichen He, Xiaojian Ma, and Yi- tao Liang. JARVIS-VLA: post-training large-scale vision language models to play visual games with keyboards and mouse. InFindings of the Association for Computational Linguistics, ACL 2025, Vienna, Austria, July 27 - August 1, 2025, pages 17878–17899. Association for Computational Linguistics, 2025

  20. [20]

    Vision-language foun- dation models as effective robot imitators

    Xinghang Li, Minghuan Liu, Hanbo Zhang, Cunjun Yu, Jie Xu, Hongtao Wu, Chilam Cheang, Ya Jing, Weinan Zhang, Huaping Liu, Hang Li, and Tao Kong. Vision-language foun- dation models as effective robot imitators. InThe Twelfth In- ternational Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024

  21. [21]

    LIBERO: benchmarking knowl- edge transfer for lifelong robot learning

    Bo Liu, Yifeng Zhu, Chongkai Gao, Yihao Feng, Qiang Liu, Yuke Zhu, and Peter Stone. LIBERO: benchmarking knowl- edge transfer for lifelong robot learning. InAdvances in Neu- ral Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023

  22. [22]

    Packnet: Adding mul- tiple tasks to a single network by iterative pruning

    Arun Mallya and Svetlana Lazebnik. Packnet: Adding mul- tiple tasks to a single network by iterative pruning. In2018 IEEE Conference on Computer Vision and Pattern Recog- nition, CVPR 2018, Salt Lake City, UT, USA, June 18- 22, 2018, pages 7765–7773. Computer Vision Foundation / IEEE Computer Society, 2018

  23. [23]

    Steering your generalists: Improving robotic foun- dation models via value guidance

    Mitsuhiko Nakamoto, Oier Mees, Aviral Kumar, and Sergey Levine. Steering your generalists: Improving robotic foun- dation models via value guidance. InConference on Robot Learning, 6-9 November 2024, Munich, Germany, pages 4996–5013. PMLR, 2024

  24. [24]

    SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model

    Delin Qu, Haoming Song, Qizhi Chen, Yuanqi Yao, Xinyi Ye, Yan Ding, Zhigang Wang, JiaYuan Gu, Bin Zhao, Dong Wang, and Xuelong Li. Spatialvla: Exploring spa- tial representations for visual-language-action model.CoRR, abs/2501.15830, 2025

  25. [25]

    Scott E. Reed, Konrad Zolna, Emilio Parisotto, Ser- gio G´omez Colmenarejo, Alexander Novikov, Gabriel Barth- Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell, Oriol Vinyals, Mahyar Bordbar, and Nando de Freitas. A general- ist agent.Trans. M...

  26. [26]

    Introduc- ing rfm-1: Giving robots human-like reason-ing capabilities, 2024

    A Sohn, A Nagabandi, C Florensa, D Adelberg, D Wu, H Fa- rooq, I Clavera, J Welborn, J Chen, N Mishra, et al. Introduc- ing rfm-1: Giving robots human-like reason-ing capabilities, 2024

  27. [27]

    Lingo-2: Driving with natural language, 2024

    Waywe Research Team et al. Lingo-2: Driving with natural language, 2024

  28. [28]

    Ol- shausen, and Trevor Darrell

    Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno A. Ol- shausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization. In9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Aus- tria, May 3-7, 2021. OpenReview.net, 2021

  29. [29]

    Any-point trajectory modeling for policy learning

    Chuan Wen, Xingyu Lin, John Ian Reyes So, Kai Chen, Qi Dou, Yang Gao, and Pieter Abbeel. Any-point trajectory modeling for policy learning. InRobotics: Science and Sys- tems XX, Delft, The Netherlands, July 15-19, 2024, 2024

  30. [30]

    SCAP: transductive test-time adaptation via supportive clique-based attribute prompting

    Chenyu Zhang, Kunlun Xu, Zichen Liu, Yuxin Peng, and Jiahuan Zhou. SCAP: transductive test-time adaptation via supportive clique-based attribute prompting. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025, pages 30032–30041. Computer Vision Foundation / IEEE, 2025

  31. [31]

    Cot-vla: Visual chain-of-thought reasoning for vision-language-action mod- els

    Qingqing Zhao, Yao Lu, Moo Jin Kim, Zipeng Fu, Zhuoyang Zhang, Yecheng Wu, Zhaoshuo Li, Qianli Ma, Song Han, Chelsea Finn, Ankur Handa, Tsung-Yi Lin, Gordon Wet- zstein, Ming-Yu Liu, and Donglai Xiang. Cot-vla: Visual chain-of-thought reasoning for vision-language-action mod- els. InIEEE/CVF Conference on Computer Vision and Pat- tern Recognition, CVPR 20...

  32. [32]

    3d- vla: A 3d vision-language-action generative world model

    Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, and Chuang Gan. 3d- vla: A 3d vision-language-action generative world model. InForty-first International Conference on Machine Learn- ing, ICML 2024, Vienna, Austria, July 21-27, 2024. Open- Review.net, 2024

  33. [33]

    Tracevla: Visual trace prompting enhances spatial-temporal awareness for generalist robotic policies

    Ruijie Zheng, Yongyuan Liang, Shuaiyi Huang, Jianfeng Gao, Hal Daum ´e III, Andrey Kolobov, Furong Huang, and Jianwei Yang. Tracevla: Visual trace prompting enhances spatial-temporal awareness for generalist robotic policies. In The Thirteenth International Conference on Learning Repre- sentations, ICLR 2025, Singapore, April 24-28, 2025. Open- Review.net, 2025

  34. [34]

    Class-aware domain knowledge fu- sion and fission for continual test-time adaptation.CoRR, abs/2510.12150, 2025

    Jiahuan Zhou, Chao Zhu, Zhenyu Cui, Zichen Liu, Xu Zou, and Gang Hua. Class-aware domain knowledge fu- sion and fission for continual test-time adaptation.CoRR, abs/2510.12150, 2025

  35. [35]

    Tran, Radu Soricut, Anikait Singh, Jaspiar Singh, Pierre Sermanet, Pannag R

    Brianna Zitkovich, Tianhe Yu, Sichun Xu, Peng Xu, Ted Xiao, Fei Xia, Jialin Wu, Paul Wohlhart, Stefan Welker, Ayzaan Wahid, Quan Vuong, Vincent Vanhoucke, Huong T. Tran, Radu Soricut, Anikait Singh, Jaspiar Singh, Pierre Sermanet, Pannag R. Sanketi, Grecia Salazar, Michael S. Ryoo, Krista Reymann, Kanishka Rao, Karl Pertsch, Igor Mordatch, Henryk Michalew...

  36. [36]

    Psydo-Code We provide a brief overview of the training pipeline out- lined in Algorithm 1, which implements PDF. Algorithm 1Perturbation Learning with Delayed Feedback (PDF) Require:Pretrained VLA parametersϕ(frozen); Perturba- tion head parametersθ(trainable); maximum augmen- tation budgetN max; bufferD. 1:foreach episodedo 2:foreach timesteptdo 3:Observ...

  37. [37]

    Additional Experiments Results on Atari 57 Table 4 presents the detailed results of PDF and JAT on the full Atari 57 benchmark. Games ID JAT (Raw Score) JAT (Human Normalized Score)PDF (Raw Score) PDF (Human Normalized Score) ALIEN 22 1427.9 ± 540.28 0.17 2034.4 ± 560.47 0.26 AMIDAR 34 105 ± 76.93 0.06 150.2 ± 57.05 0.08 ASSAULT 4 1627.57 ± 799.09 2.7 186...