Vgas: Value-guided action- chunk selection for few-shot vision-language-action adaptation

· 2026 · cs.AI · arXiv 2602.07399

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Vision--Language--Action (VLA) models bridge multimodal reasoning with physical control, but adapting them to new tasks with scarce demonstrations remains unreliable. While fine-tuned VLA policies often produce semantically plausible trajectories, failures often arise from unresolved geometric ambiguities, where near-miss actions lead to divergent execution outcomes under limited supervision. We study few-shot VLA adaptation from a \emph{generation--selection} perspective and propose a novel framework \textbf{VGAS} (\textbf{V}alue-\textbf{G}uided \textbf{A}ction-chunk \textbf{S}election). It performs inference-time best-of-$N$ selection to identify action chunks that are both semantically faithful and geometrically precise. Specifically, \textbf{VGAS} employs a finetuned VLA as a high-recall proposal generator and introduces the \textrm{Q-Chunk-Former}, a geometrically grounded Transformer critic to resolve fine-grained geometric ambiguities. In addition, we propose \textit{Explicit Geometric Regularization} (\texttt{EGR}), which shapes a discriminative value landscape to preserve action ranking resolution among near-miss candidates while mitigating value instability under scarce supervision. Experiments and theoretical analysis demonstrate that \textbf{VGAS} consistently improves success rates and robustness under limited demonstrations and distribution shifts. Our code is available at https://github.com/Jyugo-15/VGAS.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Autonomous Drift Learning in Data Streams: A Unified Perspective

cs.LG · 2026-05-02 · unverdicted · novelty 7.0

A survey proposes a novel 3D taxonomy classifying drifts into time stream, data stream, and model stream categories to unify research on non-stationary autonomous learning.

VLA-Corrector: Lightweight Detect-and-Correct Inference for Adaptive Action Horizon

cs.RO · 2026-07-02 · unverdicted · novelty 6.0

VLA-Corrector adds a detect-and-correct inference layer using a latent vision monitor and online gradient guidance to enable adaptive action horizons in chunked VLA policies.

See-and-Reach: Precise Vision-Language Navigation for UAVs within the Field of View

cs.CV · 2026-06-18 · unverdicted · novelty 6.0

Introduces UAV-VLN-FOV task and 3DG-VLN framework for precise target-visible UAV navigation, reporting 13.82% success rate gain on a new 2,717-trajectory benchmark with code released.

citing papers explorer

Showing 1 of 1 citing paper after filters.

VLA-Corrector: Lightweight Detect-and-Correct Inference for Adaptive Action Horizon cs.RO · 2026-07-02 · unverdicted · none · ref 32 · internal anchor
VLA-Corrector adds a detect-and-correct inference layer using a latent vision monitor and online gradient guidance to enable adaptive action horizons in chunked VLA policies.

Vgas: Value-guided action- chunk selection for few-shot vision-language-action adaptation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer