pith. sign in

arxiv: 2606.31382 · v1 · pith:QULBAS52new · submitted 2026-06-30 · 💻 cs.RO

Revisiting Parameter Redundancy in Vision-Language-Action Models: Insights from VLM-to-VLA Adaptation

Pith reviewed 2026-07-01 05:05 UTC · model grok-4.3

classification 💻 cs.RO
keywords parameter pruningvision-language-action modelsVLM-to-VLA adaptationmodel compressionrobotic policiesLIBERO benchmarkparameter redundancy
0
0 comments X

The pith

Pruning VLA models via adaptation divergence removes 12-30% parameters while retaining 90% performance without recovery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates parameter changes that occur when pre-trained vision-language models are adapted into vision-language-action models for controlling robots. It measures how parameters diverge across modules during this adaptation and treats direct performance impact from removal, without any recovery steps, as evidence of which parameters matter functionally. From the resulting modular patterns, it constructs a joint pruning method that targets subsets differently by module. On the LIBERO benchmark this yields 12-30% fewer parameters in OpenVLA and π0.5 while keeping roughly 90% of original task success, whereas magnitude-based or other standard criteria cause complete failure under the same no-recovery rule. The work therefore questions whether observed pruning fragility reflects true redundancy or simply the removal of adaptation-critical weights.

Core claim

Quantifying spatial patterns of parameter divergence during VLM-to-VLA adaptation reveals structured modular heterogeneities; controlled pruning without fine-tuning then serves as a diagnostic that links these divergence signals to functional importance, enabling a multi-module joint pruning scheme that reduces parameters of OpenVLA and π0.5 by 12-30% while preserving approximately 90% of original LIBERO performance.

What carries the argument

Multi-module joint pruning scheme that selects parameter subsets according to their measured divergence signals across modules during adaptation.

If this is right

  • VLA performance degradation after pruning is not an inevitable consequence of parameter reduction if subsets are chosen according to adaptation divergence.
  • Standard pruning criteria that ignore adaptation history discard functionally critical weights under recovery-free evaluation.
  • Efficient robotic policies can be obtained directly from adapted models without subsequent fine-tuning or low-rank corrections.
  • Parameter redundancy in VLAs is structured by module rather than uniformly distributed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same divergence-probe method could be applied to diagnose redundancy in other adaptation settings such as language-to-vision or multi-task fine-tuning.
  • Tracking divergence during training might allow early stopping or regularization choices that reduce redundancy before full adaptation completes.
  • If the modular patterns prove stable across different base VLMs, the pruning ratios could be predicted from adaptation statistics alone without running the full diagnostic.

Load-bearing premise

That the immediate performance change after removing a parameter subset without any fine-tuning directly reveals whether those parameters carry functional importance induced by adaptation.

What would settle it

Running the same controlled pruning experiment on a new VLA model or task where the proposed divergence-based subsets produce performance collapse comparable to random or magnitude-based removal.

Figures

Figures reproduced from arXiv: 2606.31382 by Chang Xu, Fengnian Zhang, Siyu Xu, Tao Huang, Zhong Jin.

Figure 1
Figure 1. Figure 1: Visualizing the relative parameter divergence ∆Wrel between the Prismatic (VLM) and OpenVLA (VLA) model pair. The color intensity indicates the magnitude of divergence: darker blue denotes significant parameter shifts, while brighter yellow represents minimal change. Subfigures (a)–(g) display the divergence across different modules, calculated at the granularity of individual attention heads or FFN channe… view at source ↗
Figure 2
Figure 2. Figure 2: Visualizing the relative parameter divergence ∆Wrel for the PaLI-Gemma (VLM) and π0.5 (VLA) model pair. The color intensity indicates the magnitude of divergence: darker blue denotes significant parameter shifts, while brighter yellow represents minimal change. Subfigures (a)–(d) display the divergence across different modules, calculated at the granularity of individual attention heads or FFN channels. Th… view at source ↗
Figure 3
Figure 3. Figure 3: Causal analysis of recovery difficulty: Convergence steps vs. Pruning ratio. (∼95.0% SR), while removing lowest-diff channels causes significant drops. Lan￾guage model attention also collapses (0.0%) only when the lowest-diff heads are removed. These cross-model observations confirm Hypothesis II: VLM-to-VLA parameter divergence contains structured signals that effectively distinguish vi￾tal from redundant… view at source ↗
read the original abstract

Vision-Language-Action (VLA) models have made significant strides in embodied intelligence by integrating the powerful representations of pre-trained Vision-Language Models (VLMs). However, the massive parameter scale of VLAs imposes a heavy computational burden, and these models exhibit extreme sensitivity to parameter pruning. Current paradigms often treat the resulting performance degradation as inevitable, relying on fine-tuning or low-rank corrections to recover efficacy. We challenge this convention by questioning whether the removed parameters are truly redundant if VLA pruning necessitates performance recovery to be effective, or if this paradigm masks the indiscriminate pruning of critical parameters. We revisit parameter redundancy through the lens of VLM-to-VLA adaptation, first quantifying the spatial distribution of parameter divergence during adaptation to reveal structured patterns across different modules. Subsequently, we introduce controlled pruning as a diagnostic probe: by comparing the direct impact of removing different parameter subsets on VLA performance without any fine-tuning, we establish a causal link between adaptation-induced divergence signals and functional contributions. Based on the discovered modular heterogeneities, we design a multi-module joint pruning scheme. Evaluations on the LIBERO benchmark demonstrate that our approach reduces the parameters of OpenVLA and $\pi_{0.5}$ by 12\%--30\% while maintaining approximately 90\% of the original performance without any post-pruning recovery. In contrast, existing parameter pruning criteria result in total performance collapse when evaluated under the same recovery-free constraints. Our study reveals the parameter evolution mechanism in VLA adaptation and provides a new path for deploying efficient, robust robotic policies in resource-constrained environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that quantifying parameter divergence during VLM-to-VLA adaptation reveals modular heterogeneities; a controlled no-recovery pruning probe then establishes that divergence signals mark functionally important parameters; and a resulting multi-module joint pruning scheme reduces parameters in OpenVLA and π0.5 by 12–30% while retaining ~90% of original LIBERO performance, whereas standard criteria cause total collapse under identical recovery-free conditions.

Significance. If the central empirical result holds after addressing the causality concern, the work supplies a concrete, recovery-free route to smaller VLAs and new evidence on how adaptation redistributes parameter importance. The use of a public benchmark (LIBERO) and the explicit no-fine-tuning diagnostic are strengths that support reproducibility and falsifiability.

major comments (2)
  1. [Abstract and controlled pruning experiment] Abstract (controlled pruning paragraph) and the corresponding experimental section: the claim that the performance gap demonstrates a causal link between adaptation-induced divergence and functional contribution is load-bearing yet rests on an unablated comparison. Divergence is known to correlate with layer depth and weight magnitude in VLM-to-VLA transfer; without explicit matching or regression controls on these covariates, the observed superiority over baselines could be explained by module choice rather than the divergence metric itself.
  2. [Experimental results on LIBERO] Experimental results section (LIBERO tables/figures): the reported 12–30% reduction and ~90% retention figures lack accompanying details on exact divergence quantification (e.g., which norm or distance), per-module pruning thresholds, baseline implementations, number of seeds, and statistical tests. These omissions prevent verification that the performance retention is robust and not an artifact of particular hyper-parameter choices.
minor comments (2)
  1. [Throughout] Notation: the symbol π0.5 appears inconsistently (sometimes with subscript, sometimes without); standardize throughout.
  2. [Figures] Figure clarity: ensure that divergence heat-maps or module-wise plots include axis labels, color-bar scales, and error bars where applicable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight important areas for strengthening the causal interpretation and improving experimental transparency. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Abstract and controlled pruning experiment] Abstract (controlled pruning paragraph) and the corresponding experimental section: the claim that the performance gap demonstrates a causal link between adaptation-induced divergence and functional contribution is load-bearing yet rests on an unablated comparison. Divergence is known to correlate with layer depth and weight magnitude in VLM-to-VLA transfer; without explicit matching or regression controls on these covariates, the observed superiority over baselines could be explained by module choice rather than the divergence metric itself.

    Authors: We thank the referee for pointing out this potential confound. While our experiments compare the performance impact of pruning based on divergence versus standard criteria like magnitude, we did not explicitly control for correlations with layer depth and weight magnitude through matching or regression. We agree this is a valid concern and will incorporate additional controls and ablations in the revised manuscript to better isolate the effect of the divergence metric. revision: yes

  2. Referee: [Experimental results on LIBERO] Experimental results section (LIBERO tables/figures): the reported 12–30% reduction and ~90% retention figures lack accompanying details on exact divergence quantification (e.g., which norm or distance), per-module pruning thresholds, baseline implementations, number of seeds, and statistical tests. These omissions prevent verification that the performance retention is robust and not an artifact of particular hyper-parameter choices.

    Authors: We agree with the need for more details to ensure reproducibility. The revised manuscript will include: the specific divergence measure (L2 norm on parameter deltas), the exact per-module pruning thresholds, full descriptions of baseline methods, the number of evaluation seeds, and appropriate statistical tests for the reported results. revision: yes

Circularity Check

0 steps flagged

Empirical ablation on public benchmark exhibits no circularity

full rationale

The paper conducts an empirical study: it quantifies parameter divergence across modules during VLM-to-VLA adaptation, then applies controlled pruning (no recovery fine-tuning) on the LIBERO benchmark to measure performance retention. Reported results (12-30% parameter reduction at ~90% performance for OpenVLA and π0.5) are direct benchmark outcomes, not quantities defined by or fitted to the divergence metric itself. No equations, self-citations, or ansatzes reduce the performance claims to inputs by construction; the work remains self-contained against an external public benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the central claim rests on empirical observations of parameter divergence and recovery-free pruning tests.

pith-pipeline@v0.9.1-grok · 5826 in / 1064 out tokens · 33581 ms · 2026-07-01T05:05:03.157112+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 28 canonical work pages · 13 internal anchors

  1. [1]

    Fluctuation-based adaptive structured pruning for large language models

    Yongqi An, Xu Zhao, Tao Yu, Ming Tang, and Jinqiao Wang. Fluctuation-based adaptive structured pruning for large language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 10865–10873, 2024

  2. [2]

    PaliGemma: A versatile 3B VLM for transfer

    Lucas Beyer, Andreas Steiner, André Susano Pinto, Alexander Kolesnikov, Xiao Wang,DanielSalz,MaximNeumann,IbrahimAlabdulmohsin,MichaelTschannen, Emanuele Bugliarello, et al. Paligemma: A versatile 3b vlm for transfer.arXiv preprint arXiv:2407.07726, 2024

  3. [3]

    GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

    Johan Bjorck, Fernando Castañeda, Nikita Cherniadev, Xingye Da, Runyu Ding, Linxi Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, et al. Gr00t n1: An open foundation model for generalist humanoid robots.arXiv preprint arXiv:2503.14734, 2025

  4. [4]

    RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models

    Yuxuan Chen and Xiao Li. Rlrc: Reinforcement learning-based recovery for com- pressed vision-language-action models.arXiv preprint arXiv:2506.17639, 2025

  5. [5]

    PaLM-E: An Embodied Multimodal Language Model

    Danny Driess, Fei Xia, Mehdi SM Sajjadi, Corey Lynch, Aakanksha Chowd- hery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et al. Palm-e: An embodied multimodal language model.arXiv preprint arXiv:2303.03378, 2023

  6. [6]

    Sqap-vla: A synergistic quantization-aware pruning framework for high-performance vision- language-action models.arXiv preprint arXiv:2509.09090, 2025

    Hengyu Fang, Yijiang Liu, Yuan Du, Li Du, and Huanrui Yang. Sqap-vla: A synergistic quantization-aware pruning framework for high-performance vision- language-action models.arXiv preprint arXiv:2509.09090, 2025

  7. [7]

    Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey

    Weifan Guan, Qinghao Hu, Aosheng Li, and Jian Cheng. Efficient vision-language- action models for embodied manipulation: A systematic survey.arXiv preprint arXiv:2510.17111, 2025

  8. [8]

    Actions as language: Fine-tuning vlms into vlas without catastrophic forgetting.arXiv preprint arXiv:2509.22195, 2025

    Asher J Hancock, Xindi Wu, Lihan Zha, Olga Russakovsky, and Anirudha Ma- jumdar. Actions as language: Fine-tuning vlms into vlas without catastrophic forgetting.arXiv preprint arXiv:2509.22195, 2025

  9. [9]

    Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dha- balia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fu- sai, Manuel Y. Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Per...

  10. [10]

    Don’t run with scissors: Pruning breaks vla models but they can be recovered.arXiv preprint arXiv:2510.08464, 2025

    Jason Jabbour, Dong-Ki Kim, Max Smith, Jay Patrikar, Radhika Ghosal, Youhui Wang, Ali Agha, Vijay Janapa Reddi, and Shayegan Omidshafiei. Don’t run with scissors: Pruning breaks vla models but they can be recovered.arXiv preprint arXiv:2510.08464, 2025

  11. [11]

    Prismatic vlms: Investigating the design space of visually- conditioned language models

    Siddharth Karamcheti, Suraj Nair, Ashwin Balakrishna, Percy Liang, Thomas Kol- lar, and Dorsa Sadigh. Prismatic vlms: Investigating the design space of visually- conditioned language models. InForty-first International Conference on Machine Learning, 2024

  12. [12]

    Vision-language-action models for robotics: A review towards real-world applica- tions.IEEE Access, 2025

    Kento Kawaharazuka, Jihoon Oh, Jun Yamada, Ingmar Posner, and Yuke Zhu. Vision-language-action models for robotics: A review towards real-world applica- tions.IEEE Access, 2025

  13. [13]

    OpenVLA: An Open-Source Vision-Language-Action Model

    Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakr- ishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, et al. Openvla: An open-source vision-language-action model.arXiv preprint arXiv:2406.09246, 2024

  14. [14]

    CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

    Qixiu Li, Yaobo Liang, Zeyu Wang, Lin Luo, Xi Chen, Mozheng Liao, Fangyun Wei, Yu Deng, Sicheng Xu, Yizhong Zhang, et al. Cogact: A foundational vision- language-action model for synergizing cognition and action in robotic manipula- tion.arXiv preprint arXiv:2411.19650, 2024

  15. [15]

    VLA-ATTC: Adaptive Test-Time Compute for VLA Models with Relative Action Critic Model

    Wenhao Li, Xiu Su, Yichao Cao, Hongyan Xu, Xiaobo Xia, Shan You, Yi Chen, and Chang Xu. Vla-attc: Adaptive test-time compute for vla models with relative action critic model.arXiv preprint arXiv:2605.01194, 2026

  16. [16]

    Sp-vla:Ajointmodelschedulingandtoken pruning approach for vla model acceleration.arXiv preprint arXiv:2506.12723, 2025

    Ye Li, Yuan Meng, Zewen Sun, Kangye Ji, Chen Tang, Jiajun Fan, Xinzhu Ma, ShutaoXia,ZhiWang,andWenwuZhu. Sp-vla:Ajointmodelschedulingandtoken pruning approach for vla model acceleration.arXiv preprint arXiv:2506.12723, 2025

  17. [17]

    Libero: Benchmarking knowledge transfer for lifelong robot learning

    Bo Liu, Yifeng Zhu, Chongkai Gao, Yihao Feng, Qiang Liu, Yuke Zhu, and Pe- ter Stone. Libero: Benchmarking knowledge transfer for lifelong robot learning. Advances in Neural Information Processing Systems, 36:44776–44791, 2023

  18. [18]

    Robomamba: Efficient vision-language-action model for robotic reasoning and manipulation.Ad- vances in Neural Information Processing Systems, 37:40085–40110, 2024

    Jiaming Liu, Mengzhen Liu, Zhenyu Wang, Pengju An, Xiaoqi Li, Kaichen Zhou, Senqiao Yang, Renrui Zhang, Yandong Guo, and Shanghang Zhang. Robomamba: Efficient vision-language-action model for robotic reasoning and manipulation.Ad- vances in Neural Information Processing Systems, 37:40085–40110, 2024

  19. [19]

    Llm-pruner: On the structural pruning of large language models.Advances in neural information processing sys- tems, 36:21702–21720, 2023

    Xinyin Ma, Gongfan Fang, and Xinchao Wang. Llm-pruner: On the structural pruning of large language models.Advances in neural information processing sys- tems, 36:21702–21720, 2023

  20. [20]

    A Survey on Vision-Language-Action Models for Embodied AI

    Yueen Ma, Zixing Song, Yuzheng Zhuang, Jianye Hao, and Irwin King. A survey on vision-language-action models for embodied ai.arXiv preprint arXiv:2405.14093, 2024

  21. [21]

    Mech- anistic finetuning of vision-language-action models via few-shot demonstrations

    Chancharik Mitra, Yusen Luo, Raj Saravanan, Dantong Niu, Anirudh Pai, Jesse Thomason, Trevor Darrell, Abrar Anwar, Deva Ramanan, and Roei Herzig. Mech- anistic finetuning of vision-language-action models via few-shot demonstrations. arXiv preprint arXiv:2511.22697, 2025

  22. [22]

    Saliency-aware quantized imitation learning for efficient robotic control

    Seongmin Park, Hyungmin Kim, Sangwoo Kim, Wonseok Jeon, Juyoung Yang, Byeongwook Jeon, Yoonseon Oh, and Jungwook Choi. Saliency-aware quantized imitation learning for efficient robotic control. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 13140–13150, 2025. Revisiting Parameter Redundancy in VLA 17

  23. [23]

    Action-aware dynamic pruning for efficient vision-language-action manipulation

    Xiaohuan Pei, Yuxing Chen, Siyu Xu, Yunke Wang, Yuheng Shi, and Chang Xu. Action-aware dynamic pruning for efficient vision-language-action manipulation. arXiv preprint arXiv:2509.22093, 2025

  24. [24]

    Vision-language-action (vla) models: Concepts, progress, applications and chal- lenges.arXiv preprint arXiv:2505.04769, 2025

    Ranjan Sapkota, Yang Cao, Konstantinos I Roumeliotis, and Manoj Karkee. Vision-language-action (vla) models: Concepts, progress, applications and chal- lenges.arXiv preprint arXiv:2505.04769, 2025

  25. [25]

    SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

    Mustafa Shukor, Dana Aubakirova, Francesco Capuano, Pepijn Kooijmans, Steven Palma, Adil Zouitine, Michel Aractingi, Caroline Pascal, Martino Russi, Andres Marafioti, et al. Smolvla: A vision-language-action model for affordable and effi- cient robotics.arXiv preprint arXiv:2506.01844, 2025

  26. [26]

    A Simple and Effective Pruning Approach for Large Language Models

    Mingjie Sun, Zhuang Liu, Anna Bair, and J Zico Kolter. A simple and effective pruning approach for large language models.arXiv preprint arXiv:2306.11695, 2023

  27. [27]

    Think twice, act once: Token-aware compression and action reuse for efficient inference in vision-language-action models.arXiv preprint arXiv:2505.21200, 2025

    Xudong Tan, Yaoxin Yang, Peng Ye, Jialin Zheng, Bizhe Bai, Xinyi Wang, Jia Hao, and Tao Chen. Think twice, act once: Token-aware compression and ac- tion reuse for efficient inference in vision-language-action models.arXiv preprint arXiv:2505.21200, 2025

  28. [28]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yas- mine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhos- ale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023

  29. [29]

    Bitvla: 1-bit vision-language- action models for robotics manipulation.arXiv preprint arXiv:2506.07530, 2025

    Hongyu Wang, Chuyan Xiong, Ruiping Wang, and Xilin Chen. Bitvla: 1- bit vision-language-action models for robotics manipulation.arXiv preprint arXiv:2506.07530, 2025

  30. [30]

    Tinyvla: Towards fast, data- efficient vision-language-action models for robotic manipulation.IEEE Robotics and Automation Letters, 2025

    Junjie Wen, Yichen Zhu, Jinming Li, Minjie Zhu, Zhibin Tang, Kun Wu, Zhiyuan Xu, Ning Liu, Ran Cheng, Chaomin Shen, et al. Tinyvla: Towards fast, data- efficient vision-language-action models for robotic manipulation.IEEE Robotics and Automation Letters, 2025

  31. [31]

    Vla-cache: Efficient vision-language-action ma- nipulation via adaptive token caching.arXiv preprint arXiv:2502.02175, 2025a

    Siyu Xu, Yunke Wang, Chenghao Xia, Dihao Zhu, Tao Huang, and Chang Xu. Vla- cache: Efficient vision-language-action manipulation via adaptive token caching. arXiv preprint arXiv:2502.02175, 2025

  32. [32]

    Affordance field intervention: Enabling vlas to escape memory traps in robotic manipulation.arXiv preprint arXiv:2512.07472, 2025

    Siyu Xu, Zijian Wang, Yunke Wang, Chenghao Xia, Tao Huang, and Chang Xu. Affordance field intervention: Enabling vlas to escape memory traps in robotic manipulation.arXiv preprint arXiv:2512.07472, 2025

  33. [33]

    Qvla: Not all channels are equal in vision-language-action model’s quantization.arXiv preprint arXiv:2602.03782, 2026

    Yuhao Xu, Yantai Yang, Zhenyang Fan, Yufan Liu, Yuming Li, Bing Li, and Zhipeng Zhang. Qvla: Not all channels are equal in vision-language-action model’s quantization.arXiv preprint arXiv:2602.03782, 2026

  34. [34]

    Efficientvla:Training-freeaccelerationand compression for vision-language-action models.arXiv preprint arXiv:2506.10100, 2025

    Yantai Yang, Yuhao Wang, Zichen Wen, Luo Zhongwei, Chang Zou, Zhipeng Zhang,ChuanWen,andLinfengZhang. Efficientvla:Training-freeaccelerationand compression for vision-language-action models.arXiv preprint arXiv:2506.10100, 2025

  35. [35]

    A survey on efficient vision-language-action models, 2025

    Zhaoshu Yu, Bo Wang, Pengpeng Zeng, Haonan Zhang, Ji Zhang, Lianli Gao, Jingkuan Song, Nicu Sebe, and Heng Tao Shen. A survey on efficient vision- language-action models.arXiv preprint arXiv:2510.24795, 2025

  36. [36]

    Deer-vla: Dynamic inference of multimodal large language models for efficient robot execution.Advances in Neural Information Processing Systems, 37:56619–56643, 2024

    Yang Yue, Yulin Wang, Bingyi Kang, Yizeng Han, Shenzhi Wang, Shiji Song, Jiashi Feng, and Gao Huang. Deer-vla: Dynamic inference of multimodal large language models for efficient robot execution.Advances in Neural Information Processing Systems, 37:56619–56643, 2024

  37. [37]

    VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models

    Jianke Zhang, Xiaoyu Chen, Qiuyue Wang, Mingsheng Li, Yanjiang Guo, Yucheng Hu, Jiajun Zhang, Shuai Bai, Junyang Lin, and Jianyu Chen. Vlm4vla: Re- 18 F. Zhang et al. visiting vision-language-models in vision-language-action models.arXiv preprint arXiv:2601.03309, 2026

  38. [38]

    Mole-vla: Dynamic layer-skipping vision language action model via mixture-of-layers for efficient robot manipulation

    Rongyu Zhang, Menghang Dong, Yuan Zhang, Liang Heng, Xiaowei Chi, Gaole Dai, Li Du, Yuan Du, and Shanghang Zhang. Mole-vla: Dynamic layer-skipping vision language action model via mixture-of-layers for efficient robot manipulation. arXiv preprint arXiv:2503.20384, 2025

  39. [39]

    strong compensation

    Brianna Zitkovich, Tianhe Yu, Sichun Xu, Peng Xu, Ted Xiao, Fei Xia, Jialin Wu, Paul Wohlhart, Stefan Welker, Ayzaan Wahid, et al. Rt-2: Vision-language- action models transfer web knowledge to robotic control. InConference on Robot Learning, pages 2165–2183. PMLR, 2023. Revisiting Parameter Redundancy in VLA 19 A Detailed Model Composition and Parameter ...