Vgas: Value-guided action- chunk selection for few-shot vision-language-action adaptation

· 2026 · cs.AI · arXiv 2602.07399

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Vision--Language--Action (VLA) models bridge multimodal reasoning with physical control, but adapting them to new tasks with scarce demonstrations remains unreliable. While fine-tuned VLA policies often produce semantically plausible trajectories, failures often arise from unresolved geometric ambiguities, where near-miss actions lead to divergent execution outcomes under limited supervision. We study few-shot VLA adaptation from a \emph{generation--selection} perspective and propose a novel framework \textbf{VGAS} (\textbf{V}alue-\textbf{G}uided \textbf{A}ction-chunk \textbf{S}election). It performs inference-time best-of-$N$ selection to identify action chunks that are both semantically faithful and geometrically precise. Specifically, \textbf{VGAS} employs a finetuned VLA as a high-recall proposal generator and introduces the \textrm{Q-Chunk-Former}, a geometrically grounded Transformer critic to resolve fine-grained geometric ambiguities. In addition, we propose \textit{Explicit Geometric Regularization} (\texttt{EGR}), which shapes a discriminative value landscape to preserve action ranking resolution among near-miss candidates while mitigating value instability under scarce supervision. Experiments and theoretical analysis demonstrate that \textbf{VGAS} consistently improves success rates and robustness under limited demonstrations and distribution shifts. Our code is available at https://github.com/Jyugo-15/VGAS.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Autonomous Drift Learning in Data Streams: A Unified Perspective

cs.LG · 2026-05-02 · unverdicted · novelty 7.0

A survey proposes a novel 3D taxonomy classifying drifts into time stream, data stream, and model stream categories to unify research on non-stationary autonomous learning.

VLA-Corrector: Lightweight Detect-and-Correct Inference for Adaptive Action Horizon

cs.RO · 2026-07-02 · unverdicted · novelty 6.0

VLA-Corrector adds a detect-and-correct inference layer using a latent vision monitor and online gradient guidance to enable adaptive action horizons in chunked VLA policies.

citing papers explorer

Showing 2 of 2 citing papers.

Autonomous Drift Learning in Data Streams: A Unified Perspective cs.LG · 2026-05-02 · unverdicted · none · ref 194 · internal anchor
A survey proposes a novel 3D taxonomy classifying drifts into time stream, data stream, and model stream categories to unify research on non-stationary autonomous learning.
VLA-Corrector: Lightweight Detect-and-Correct Inference for Adaptive Action Horizon cs.RO · 2026-07-02 · unverdicted · none · ref 32 · internal anchor
VLA-Corrector adds a detect-and-correct inference layer using a latent vision monitor and online gradient guidance to enable adaptive action horizons in chunked VLA policies.

Vgas: Value-guided action- chunk selection for few-shot vision-language-action adaptation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer