Inverse dynamics prediction is added as an auxiliary task to reduce state aliasing in VLA models by directly supervising the vision encoder on action-relevant visual distinctions using only standard observation-action pairs.
MoTVLA: A vision-language-action model with unified fast-slow reasoning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
AsyncVLA adds asynchronous flow matching and a confidence rater to VLA models so they can generate actions on flexible schedules and selectively refine low-confidence tokens before execution.
citing papers explorer
-
AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action Models
AsyncVLA adds asynchronous flow matching and a confidence rater to VLA models so they can generate actions on flexible schedules and selectively refine low-confidence tokens before execution.