arXiv preprint arXiv:2505.21200 (2025)

Think Twice, Act Once: Token-Aware Compression · 2025 · arXiv 2505.21200

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Drop-Then-Recovery: How Redundant Are Vision-Language-Action Models?

cs.RO · 2026-06-26 · accept · novelty 7.0

VLA language backbones show high redundancy on manipulation benchmarks, with half the LLM blocks removable and even two blocks sufficient to recover baseline performance after fine-tuning, unlike vision and action pathways.

Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models

cs.RO · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Pace-and-Path Correction decomposes a quadratic cost minimization into orthogonal pace and path channels to correct chunked actions in VLA models, raising success rates by up to 28.8% in dynamic settings.

KERV: Kinematic-Rectified Speculative Decoding for Embodied VLA Models

cs.RO · 2026-03-02 · unverdicted · novelty 7.0

KERV integrates kinematic Kalman Filter predictions with speculative decoding in VLA models to achieve 27-37% faster inference while maintaining nearly the same task success rates.

VLA-InfoEntropy: A Training-Free Vision-Attention Information Entropy Approach for Vision-Language-Action Models Inference Acceleration and Success

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

VLA-InfoEntropy accelerates Vision-Language-Action model inference by using visual entropy, attention entropy, and timestep cues to prune redundant tokens while preserving task-critical content.

OxyGen: Unified KV Cache Management for VLA Inference under Multi-Task Parallelism

cs.RO · 2026-03-15 · unverdicted · novelty 6.0

OxyGen unifies KV cache management in MoT VLAs to enable cross-task KV sharing and cross-frame continuous batching, delivering up to 3.7x speedup with 200+ tokens/s language and 70 Hz action on on-device platforms.

ActDistill: General Action-Guided Self-Derived Distillation for Efficient Vision-Language-Action Models

cs.CV · 2025-11-22 · conditional · novelty 6.0

ActDistill transfers action knowledge from heavy VLA teacher models to lightweight students via graph-encapsulated hierarchies and action-guided dynamic routing, delivering over 50% computation reduction and 1.67x speedup with comparable or better performance on embodied tasks.

ElegantVLA: Learning When to Think for Efficient Vision-Language-Action Models

cs.RO · 2026-05-28 · unverdicted · novelty 5.0

ElegantVLA accelerates VLA models up to 3.77x by dynamically scheduling compute across vision, language, and action components without retraining the base model.

Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey

cs.RO · 2025-08-18 · unverdicted · novelty 5.0

This survey organizes large VLM-based VLA models for robotic manipulation into monolithic and hierarchical paradigms, reviews their integrations and datasets, and outlines future directions.

AttenA+: Rectifying Action Inequality in Robotic Foundation Models

cs.RO · 2026-05-13 · unverdicted · novelty 4.0

AttenA+ reweights action training objectives in VLA and WAM models via inverse velocity attention to prioritize kinematically critical segments, yielding small benchmark gains.

citing papers explorer

Showing 2 of 2 citing papers after filters.

ActDistill: General Action-Guided Self-Derived Distillation for Efficient Vision-Language-Action Models cs.CV · 2025-11-22 · conditional · none · ref 23
ActDistill transfers action knowledge from heavy VLA teacher models to lightweight students via graph-encapsulated hierarchies and action-guided dynamic routing, delivering over 50% computation reduction and 1.67x speedup with comparable or better performance on embodied tasks.
Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey cs.RO · 2025-08-18 · unverdicted · none · ref 115
This survey organizes large VLM-based VLA models for robotic manipulation into monolithic and hierarchical paradigms, reviews their integrations and datasets, and outlines future directions.

arXiv preprint arXiv:2505.21200 (2025)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer