hub

arXiv preprint arXiv:2506.12723 (2025)

Li, Y · 2025 · arXiv 2506.12723

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Revisiting Parameter Redundancy in Vision-Language-Action Models: Insights from VLM-to-VLA Adaptation

cs.RO · 2026-06-30 · unverdicted · novelty 7.0

VLA models from VLM adaptation can be pruned 12-30% via multi-module joint scheme based on divergence signals while keeping ~90% performance on LIBERO without post-pruning recovery, unlike standard criteria that collapse.

Test-time Sparsity for Extreme Fast Action Diffusion

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

Test-time sparsity with a parallel pipeline and omnidirectional feature reuse accelerates action diffusion by 5x to 47.5 Hz while cutting FLOPs 92% with no performance loss.

The Speedup Paradox: Rethinking Inference Speed-Quality Trade-off in Embodied Tasks

cs.RO · 2026-06-26 · unverdicted · novelty 6.0 · 2 refs

TISED decomposes inference optimization effects on embodied tasks and identifies paradoxical outcomes where faster per-step inference can increase task completion time on static tasks or raise success rates on dynamic tasks.

UniFS: Unified Fast-to-Slow Hierarchical Architecture for Vision-Language-Action Models

cs.RO · 2026-06-22 · unverdicted · novelty 6.0

UniFS achieves 98.3% success on LIBERO with 2.1x lower latency than prior fast-slow VLA models by stratifying VLM layer update frequencies, inverting latent interactions, and applying multi-level supervision.

PolicyTrim: Boosting Intrinsic Policy Efficiency of Vision-Language-Action Models

cs.CV · 2026-06-21 · unverdicted · novelty 6.0

PolicyTrim is an RL post-training framework that boosts VLA policy efficiency by 3x chunk utilization and 51.4% fewer steps, yielding up to 5.83x speedup.

Afford-VLA: Action-Aligned Visual Planning via Internalized Affordance

cs.RO · 2026-05-22 · unverdicted · novelty 6.0

Afford-VLA internalizes task-conditioned affordance as an explicit visual planning interface within VLA models via learnable <AFF> tokens, achieving SOTA on LIBERO and SimplerEnv benchmarks.

FASTER: Rethinking Real-Time Flow VLAs

cs.RO · 2026-03-19 · unverdicted · novelty 6.0 · 2 refs

FASTER adds a Horizon-Aware Schedule to flow VLAs that compresses immediate-action denoising to one step while keeping long-horizon trajectory quality, lowering real-robot reaction latency.

OxyGen: Unified KV Cache Management for VLA Inference under Multi-Task Parallelism

cs.RO · 2026-03-15 · unverdicted · novelty 6.0

OxyGen unifies KV cache management in MoT VLAs to enable cross-task KV sharing and cross-frame continuous batching, delivering up to 3.7x speedup with 200+ tokens/s language and 70 Hz action on on-device platforms.

GeoSem-WAM: Geometry- and Semantic-Aware World Action Models

cs.RO · 2026-06-02 · unverdicted · novelty 5.0

GeoSem-WAM adds geometric and semantic auxiliary prediction tasks to World Action Models during training to improve latent representations and action prediction accuracy while keeping inference efficient by avoiding explicit future rollouts.

ElegantVLA: Learning When to Think for Efficient Vision-Language-Action Models

cs.RO · 2026-05-28 · unverdicted · novelty 5.0

ElegantVLA accelerates VLA models up to 3.77x by dynamically scheduling compute across vision, language, and action components without retraining the base model.

AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention

cs.LG · 2025-11-24 · unverdicted · novelty 5.0

AVA-VLA reformulates VLA learning as a POMDP using recurrent states and active visual attention to achieve state-of-the-art results on LIBERO, CALVIN, and real dual-arm tasks.

Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey

cs.RO · 2025-08-18 · unverdicted · novelty 5.0

This survey organizes large VLM-based VLA models for robotic manipulation into monolithic and hierarchical paradigms, reviews their integrations and datasets, and outlines future directions.

AttenA+: Rectifying Action Inequality in Robotic Foundation Models

cs.RO · 2026-05-13 · unverdicted · novelty 4.0 · 2 refs

AttenA+ reweights action training objectives in VLA and WAM models via inverse velocity attention to prioritize kinematically critical segments, yielding small benchmark gains.

AnySlot: Goal-Conditioned Vision-Language-Action Policies for Zero-Shot Slot-Level Placement

cs.RO · 2026-04-12

citing papers explorer

Showing 14 of 14 citing papers.

Revisiting Parameter Redundancy in Vision-Language-Action Models: Insights from VLM-to-VLA Adaptation cs.RO · 2026-06-30 · unverdicted · none · ref 16
VLA models from VLM adaptation can be pruned 12-30% via multi-module joint scheme based on divergence signals while keeping ~90% performance on LIBERO without post-pruning recovery, unlike standard criteria that collapse.
Test-time Sparsity for Extreme Fast Action Diffusion cs.CV · 2026-05-13 · unverdicted · none · ref 12
Test-time sparsity with a parallel pipeline and omnidirectional feature reuse accelerates action diffusion by 5x to 47.5 Hz while cutting FLOPs 92% with no performance loss.
The Speedup Paradox: Rethinking Inference Speed-Quality Trade-off in Embodied Tasks cs.RO · 2026-06-26 · unverdicted · none · ref 20 · 2 links
TISED decomposes inference optimization effects on embodied tasks and identifies paradoxical outcomes where faster per-step inference can increase task completion time on static tasks or raise success rates on dynamic tasks.
UniFS: Unified Fast-to-Slow Hierarchical Architecture for Vision-Language-Action Models cs.RO · 2026-06-22 · unverdicted · none · ref 23
UniFS achieves 98.3% success on LIBERO with 2.1x lower latency than prior fast-slow VLA models by stratifying VLM layer update frequencies, inverting latent interactions, and applying multi-level supervision.
PolicyTrim: Boosting Intrinsic Policy Efficiency of Vision-Language-Action Models cs.CV · 2026-06-21 · unverdicted · none · ref 24
PolicyTrim is an RL post-training framework that boosts VLA policy efficiency by 3x chunk utilization and 51.4% fewer steps, yielding up to 5.83x speedup.
Afford-VLA: Action-Aligned Visual Planning via Internalized Affordance cs.RO · 2026-05-22 · unverdicted · none · ref 18
Afford-VLA internalizes task-conditioned affordance as an explicit visual planning interface within VLA models via learnable <AFF> tokens, achieving SOTA on LIBERO and SimplerEnv benchmarks.
FASTER: Rethinking Real-Time Flow VLAs cs.RO · 2026-03-19 · unverdicted · none · ref 42 · 2 links
FASTER adds a Horizon-Aware Schedule to flow VLAs that compresses immediate-action denoising to one step while keeping long-horizon trajectory quality, lowering real-robot reaction latency.
OxyGen: Unified KV Cache Management for VLA Inference under Multi-Task Parallelism cs.RO · 2026-03-15 · unverdicted · none · ref 23
OxyGen unifies KV cache management in MoT VLAs to enable cross-task KV sharing and cross-frame continuous batching, delivering up to 3.7x speedup with 200+ tokens/s language and 70 Hz action on on-device platforms.
GeoSem-WAM: Geometry- and Semantic-Aware World Action Models cs.RO · 2026-06-02 · unverdicted · none · ref 23
GeoSem-WAM adds geometric and semantic auxiliary prediction tasks to World Action Models during training to improve latent representations and action prediction accuracy while keeping inference efficient by avoiding explicit future rollouts.
ElegantVLA: Learning When to Think for Efficient Vision-Language-Action Models cs.RO · 2026-05-28 · unverdicted · none · ref 17
ElegantVLA accelerates VLA models up to 3.77x by dynamically scheduling compute across vision, language, and action components without retraining the base model.
AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention cs.LG · 2025-11-24 · unverdicted · none · ref 22
AVA-VLA reformulates VLA learning as a POMDP using recurrent states and active visual attention to achieve state-of-the-art results on LIBERO, CALVIN, and real dual-arm tasks.
Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey cs.RO · 2025-08-18 · unverdicted · none · ref 192
This survey organizes large VLM-based VLA models for robotic manipulation into monolithic and hierarchical paradigms, reviews their integrations and datasets, and outlines future directions.
AttenA+: Rectifying Action Inequality in Robotic Foundation Models cs.RO · 2026-05-13 · unverdicted · none · ref 23 · 2 links
AttenA+ reweights action training objectives in VLA and WAM models via inverse velocity attention to prioritize kinematically critical segments, yielding small benchmark gains.
AnySlot: Goal-Conditioned Vision-Language-Action Policies for Zero-Shot Slot-Level Placement cs.RO · 2026-04-12 · unreviewed · ref 16

arXiv preprint arXiv:2506.12723 (2025)

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer