{"total":14,"items":[{"citing_arxiv_id":"2606.31382","ref_index":35,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Revisiting Parameter Redundancy in Vision-Language-Action Models: Insights from VLM-to-VLA Adaptation","primary_cat":"cs.RO","submitted_at":"2026-06-30T09:10:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"VLA models from VLM adaptation can be pruned 12-30% via multi-module joint scheme based on divergence signals while keeping ~90% performance on LIBERO without post-pruning recovery, unlike standard criteria that collapse.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.27755","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Drop-Then-Recovery: How Redundant Are Vision-Language-Action Models?","primary_cat":"cs.RO","submitted_at":"2026-06-26T06:22:17+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"VLA language backbones show high redundancy on manipulation benchmarks, with half the LLM blocks removable and even two blocks sufficient to recover baseline performance after fine-tuning, unlike vision and action pathways.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.08094","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"vla.cpp: A Unified Inference Runtime for Vision-Language-Action Models","primary_cat":"cs.RO","submitted_at":"2026-06-06T10:45:40+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"vla.cpp is a unified C++ runtime that serves multiple VLA architectures with flow-matching and diffusion patterns, matching SOTA performance on LIBERO while running on low-memory embedded hardware.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00966","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Threading Optimization for Vision-Language-Action Model Inference in Low-Cost Smart Agricultural Manipulation","primary_cat":"cs.RO","submitted_at":"2026-05-31T02:49:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"Threading optimization of RTAC for VLA models reduces end-to-end latency and improves stability on low-cost agricultural robotic arms without changing the policy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.31256","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Before Parc Ferm\\'e: RL-Time Pruning for Efficient Embodied LLMs in Autonomous Driving","primary_cat":"cs.RO","submitted_at":"2026-05-29T12:53:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"BPF prunes embodied LLM controllers iteratively during RL (and optionally SFT) to achieve superior size-performance-throughput trade-offs compared to post-training pruning or smaller dense models on the RobotxR1 autonomous driving pipeline.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29438","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ElegantVLA: Learning When to Think for Efficient Vision-Language-Action Models","primary_cat":"cs.RO","submitted_at":"2026-05-28T06:33:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ElegantVLA accelerates VLA models up to 3.77x by dynamically scheduling compute across vision, language, and action components without retraining the base model.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11567","ref_index":4,"ref_count":3,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Dynamic Execution Commitment of Vision-Language-Action Models","primary_cat":"cs.CV","submitted_at":"2026-05-12T05:52:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A3 reframes dynamic action chunk commitment in VLA models as self-speculative prefix verification, accepting the longest continuous sequence of actions that satisfies consensus-ordered conditional invariance and prefix-closed sequential consistency.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Experiments across diverse VLA models and benchmarks demonstrate that A3 eliminates the need for manual horizon tuning while achieving a superior trade-off between execution robustness and inference throughput. 1 Introduction Vision-Language-Action (VLA) models [1, 2, 3] have emerged as a primary paradigm for generaliz- able embodied intelligence [4, 5], mapping high-dimensional visual observations and natural language instructions directly to precise motor sequences. To mitigate the high computational overhead of large-scale vision-language backbones [6], modern VLA architectures increasingly adopt dual-system designs [1] that decouple deliberative reasoning from reactive execution. A core efficiency strategy"},{"citing_arxiv_id":"2604.24622","ref_index":52,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies","primary_cat":"cs.CV","submitted_at":"2026-04-27T15:51:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CF-VLA uses a coarse initialization over endpoint velocity followed by single-step refinement to achieve strong performance with low inference steps on CALVIN, LIBERO, and real-robot tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.24447","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment","primary_cat":"cs.RO","submitted_at":"2026-04-27T13:12:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with marginal task degradation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.24182","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"$M^2$-VLA: Boosting Vision-Language Models for Generalizable Manipulation via Layer Mixture and Meta-Skills","primary_cat":"cs.RO","submitted_at":"2026-04-27T08:44:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"M²-VLA shows that generalized VLMs can serve as direct backbones for robotic manipulation by selectively extracting task-critical features via Mixture of Layers and adding Meta Skill Modules for efficient trajectory learning.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Current VLA frameworks typically utilize pre-trained VLM backbones, fine-tuning their internal parameters on large-scale robotic datasets to repurpose the model's gen- eration capabilities from language tokens to action tokens. However, this paradigm brings catastrophic forgetting [6], degrading the VLM's inherent semantic understanding and consequently limiting the VLA's generalizability [7], [8]. As illustrated in Fig. 1, while state-of-the-art VLA models perform well on in-domain tasks, they struggle with novel instructions or objects and frequently lose their Visual Ques- tion Answering (VQA) capabilities. We argue that preserving semantic understanding is a fundamental prerequisite for robust manipulation [9], [3]. *Corresponding author."},{"citing_arxiv_id":"2604.23775","ref_index":87,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms","primary_cat":"cs.RO","submitted_at":"2026-04-26T15:58:19+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A literature survey that unifies fragmented work on attacks, defenses, evaluations, and deployment challenges for Vision-Language-Action models in robotics.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Agentsafe: Benchmarking the safety of embodied agents on hazardous instructions. arXiv preprint arXiv:2506.14697, 2025. 40 [86] Dian Yu, Qingchuan Zhou, Bingkun Huang, Majid Khadiv, and Zewen Yang. Safe-night vla: Seeing the unseen via thermal-perceptive vision-language-action models for safety-critical manipulation, 2026. https: //arxiv.org/abs/2603.05754. [87] Zhaoshu Yu, Bo Wang, Pengpeng Zeng, Haonan Zhang, Ji Zhang, Zheng Wang, Lianli Gao, Jingkuan Song, Nicu Sebe, and Heng Tao Shen. A survey on efficient vision-language-action models.arXiv preprint arXiv:2510.24795, 2025. [88] Lu Yue, Dongliang Zhou, Liang Xie, Feitian Zhang, Ye Yan, and Erwei Yin. Safe-vln: Collision avoidance for vision-and-language navigation of autonomous robots operating in continuous environments."},{"citing_arxiv_id":"2604.19710","ref_index":76,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model","primary_cat":"cs.CV","submitted_at":"2026-04-21T17:34:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SpanVLA reduces action generation latency via flow-matching conditioned on history and improves robustness by training on negative-recovery samples with GRPO and a dedicated reasoning dataset.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"T able 1:Comparison with SOTA methods on theNA VSIM v1(navtest). PDMS (Predictive Driver Model Score), NC (No Collision), DAC (Drivable Area Compliance), EP (Ego Process), TTC (Time-To-Collision), Comf. (Comfort), Methods Cam. Lid. PDMS↑ NC↑DAC↑EP↑TTC↑Comf.↑ Conventional End-to-end-based Methods TransFuser [6] ✓ ✓ 84.0 97.8 92.6 78.9 92.9100.0 DRAMA [76] ✓ ✓ 86.9 98.2 95.2 81.3 94.2100.0 Hydra-MDP [41] ✓ ✓ 86.5 98.3 96.0 78.7 94.6100.0 DiffusionDrive [42] ✓ ✓ 88.1 98.2 96.2 82.2 94.7100.0 WoTE [39] ✓ ✓ 88.3 98.5 96.8 81.9 94.4 99.9 VLA-based Methods ReCogDrive [38] ✓- 89.6 98.2 97.8 83.5 95.2 99.8 DriveVLA-W0 [38] ✓- 90.2 98.799.183.3 95.3 99.3 AutoVLA [38] ✓- 89.1 98.4 95.6 81.998.099.9 Ours SpanVLA (One-shot) ✓- 82."},{"citing_arxiv_id":"2604.02965","ref_index":40,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA","primary_cat":"cs.RO","submitted_at":"2026-04-03T10:55:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SV-VLA uses infrequent heavy VLA planning of action chunks plus a lightweight closed-loop verifier to achieve both efficiency and robustness in dynamic robot control.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.19199","ref_index":103,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"FASTER: Rethinking Real-Time Flow VLAs","primary_cat":"cs.RO","submitted_at":"2026-03-19T17:51:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FASTER adds a Horizon-Aware Schedule to flow VLAs that compresses immediate-action denoising to one step while keeping long-horizon trajectory quality, lowering real-robot reaction latency.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"proaches incorporate a dedicated action expert alongside the VLM backbone, generating high-quality actions conditioned on vision-language features. Real-Time VLAs.In contrast to VLMs operating purely in cyberspace, VLAs interact with the physical world and are therefore highly sensitive to real-time interaction [43,95]. Consequently, improving the efficiency of VLAs has become an active research focus [25,103]. A straightforward strategy is to shorten the model inference latency. Existing approaches include adopting smaller VLM backbones [12,45,68,91], compressing LLM layers [13,99,104,107], accelerat- ing action decoding [38,67,76,87], distilling diffusion models [50], pruning visual tokens [23,55,66,97], and applying optimization or quantization [16,57,64,86,92]."}],"limit":50,"offset":0}