Inverse dynamics prediction is added as an auxiliary task to reduce state aliasing in VLA models by directly supervising the vision encoder on action-relevant visual distinctions using only standard observation-action pairs.
MoTVLA: A vision-language-action model with unified fast-slow reasoning
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
verdicts
UNVERDICTED 3representative citing papers
AsyncVLA adds asynchronous flow matching and a confidence rater to VLA models so they can generate actions on flexible schedules and selectively refine low-confidence tokens before execution.
A visuo-tactile policy learning method that exploits tactile motion correlation for contact state distinction and Mixture-of-Transformers for cross-modal fusion.
citing papers explorer
-
AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action Models
AsyncVLA adds asynchronous flow matching and a confidence rater to VLA models so they can generate actions on flexible schedules and selectively refine low-confidence tokens before execution.