{"total":14,"items":[{"citing_arxiv_id":"2606.31382","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Revisiting Parameter Redundancy in Vision-Language-Action Models: Insights from VLM-to-VLA Adaptation","primary_cat":"cs.RO","submitted_at":"2026-06-30T09:10:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"VLA models from VLM adaptation can be pruned 12-30% via multi-module joint scheme based on divergence signals while keeping ~90% performance on LIBERO without post-pruning recovery, unlike standard criteria that collapse.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.28529","ref_index":20,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Speedup Paradox: Rethinking Inference Speed-Quality Trade-off in Embodied Tasks","primary_cat":"cs.RO","submitted_at":"2026-06-26T18:28:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TISED decomposes inference optimization effects on embodied tasks and identifies paradoxical outcomes where faster per-step inference can increase task completion time on static tasks or raise success rates on dynamic tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.22794","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"UniFS: Unified Fast-to-Slow Hierarchical Architecture for Vision-Language-Action Models","primary_cat":"cs.RO","submitted_at":"2026-06-22T03:10:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"UniFS achieves 98.3% success on LIBERO with 2.1x lower latency than prior fast-slow VLA models by stratifying VLM layer update frequencies, inverting latent interactions, and applying multi-level supervision.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.22540","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PolicyTrim: Boosting Intrinsic Policy Efficiency of Vision-Language-Action Models","primary_cat":"cs.CV","submitted_at":"2026-06-21T14:54:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PolicyTrim is an RL post-training framework that boosts VLA policy efficiency by 3x chunk utilization and 51.4% fewer steps, yielding up to 5.83x speedup.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.03188","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"GeoSem-WAM: Geometry- and Semantic-Aware World Action Models","primary_cat":"cs.RO","submitted_at":"2026-06-02T05:48:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"GeoSem-WAM adds geometric and semantic auxiliary prediction tasks to World Action Models during training to improve latent representations and action prediction accuracy while keeping inference efficient by avoiding explicit future rollouts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29438","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ElegantVLA: Learning When to Think for Efficient Vision-Language-Action Models","primary_cat":"cs.RO","submitted_at":"2026-05-28T06:33:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ElegantVLA accelerates VLA models up to 3.77x by dynamically scheduling compute across vision, language, and action components without retraining the base model.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.24203","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Afford-VLA: Action-Aligned Visual Planning via Internalized Affordance","primary_cat":"cs.RO","submitted_at":"2026-05-22T20:43:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Afford-VLA internalizes task-conditioned affordance as an explicit visual planning interface within VLA models via learnable <AFF> tokens, achieving SOTA on LIBERO and SimplerEnv benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13548","ref_index":23,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AttenA+: Rectifying Action Inequality in Robotic Foundation Models","primary_cat":"cs.RO","submitted_at":"2026-05-13T13:55:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"AttenA+ reweights action training objectives in VLA and WAM models via inverse velocity attention to prioritize kinematically critical segments, yielding small benchmark gains.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"to push state-of-the-art (SOTA) performance on LIBEROtasks. The π model series, including π0 [2], π0 + FAST [18], and π0.5 [10], advances generative VLA capabilities through flow matching for strong generalization. Other representative VLA models and optimizations include UniVLA [7], VLA-ADP [19], CogACT [20], SmolVLA [21], NORA and NORA-Long [22], WorldVLA and WorldVLA* [8], SP-VLA [23], FlashVLA [24], VLA-Cache [25], FastV and FastV(+OFT) [ 26], SparseVLM [27], and CSP [28]. Parallel efforts emerging as W AMs include Motus [13], LingBot-V A [14], and Fast-W AM [29]. Despite consistent progress across benchmarks, nearly all existing action models share a core limitation: treating all action timesteps equally during training, neglecting the"},{"citing_arxiv_id":"2605.13316","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Test-time Sparsity for Extreme Fast Action Diffusion","primary_cat":"cs.CV","submitted_at":"2026-05-13T10:28:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Test-time sparsity with a parallel pipeline and omnidirectional feature reuse accelerates action diffusion by 5x to 47.5 Hz while cutting FLOPs 92% with no performance loss.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.10432","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AnySlot: Goal-Conditioned Vision-Language-Action Policies for Zero-Shot Slot-Level Placement","primary_cat":"cs.RO","submitted_at":"2026-04-12T03:09:44+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.19199","ref_index":42,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"FASTER: Rethinking Real-Time Flow VLAs","primary_cat":"cs.RO","submitted_at":"2026-03-19T17:51:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FASTER adds a Horizon-Aware Schedule to flow VLAs that compresses immediate-action denoising to one step while keeping long-horizon trajectory quality, lowering real-robot reaction latency.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"-Compressing the LLM backbone through mechanisms such as layer selection and early exiting [2,13,33,34,73,76,99,100,102,104,107]; -Accelerating action decoding, mainly for auto-regressive VLAs [38,67,76,77, 87]; -Pruning visual tokens, as multi-view image inputs account for a large pro- portion of tokens while often introducing perceptual redundancy [20,23,30, 36,41,42,55,66,84,90,97,101,102]; -Applying low-level inference optimizations or quantization techniques [13, 16,20,57,59,64,65,79,86,92,94]. Alineofworkcloselyrelatedtooursaimstodistillmulti-stepdiffusionorflow matchingmodelsintoone-stepmodels,ortotrainone-stepmodelsdirectly.Inthe VLA context, the only existing work in diffusion distillation is RDT2 [50], while other studies focus on conventional diffusion policies [11,17,35,56,71,74,89,106]."},{"citing_arxiv_id":"2603.14371","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"OxyGen: Unified KV Cache Management for VLA Inference under Multi-Task Parallelism","primary_cat":"cs.RO","submitted_at":"2026-03-15T13:23:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"OxyGen unifies KV cache management in MoT VLAs to enable cross-task KV sharing and cross-frame continuous batching, delivering up to 3.7x speedup with 200+ tokens/s language and 70 Hz action on on-device platforms.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.18960","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention","primary_cat":"cs.LG","submitted_at":"2025-11-24T10:22:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"AVA-VLA reformulates VLA learning as a POMDP using recurrent states and active visual attention to achieve state-of-the-art results on LIBERO, CALVIN, and real dual-arm tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2508.13073","ref_index":192,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey","primary_cat":"cs.RO","submitted_at":"2025-08-18T16:45:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"This survey organizes large VLM-based VLA models for robotic manipulation into monolithic and hierarchical paradigms, reviews their integrations and datasets, and outlines future directions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"cies through interaction or pre- collected trajectories. VLA-RL [181], ReWiND [39], Grape [177], TGRPO [182], HIL-SERL [186], ConRFT [187], RLDG [188], iRe-VLA [189] Training-Free Methods Improve VLA models via archi- tectural or computational opti- mizations without retraining. FlashVLA [115], EfficientVLA [190], VLA- Cache [191], PD-VLA [113], SP-VLA [192], BAC [193], FAST [124], RTC [41] Learning from Human Videos Leverage human videos to adapt robot policies, enabling cross-domain transfer. Human-Robot Semantic Alignment [42], UniVLA [194], LAPA [195], VPDD [196], 3D-VLA [197], Humanoid-VLA [198] World Model-based VLA Integrate predictive world mod- els into VLA to model environ- ment dynamics. World-VLA [38], World4Omni [43], 3D-"}],"limit":50,"offset":0}