FASTER: Rethinking Real-Time Flow VLAs

· 2026 · cs.RO · arXiv 2603.19199

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in action chunking policies, this paper presents a systematic analysis of the factors governing reaction time. We show that reaction time follows a uniform distribution determined jointly by the Time to First Action (TTFA) and the execution horizon. Moreover, we reveal that the standard practice of applying a constant schedule in flow-based VLAs can be inefficient and forces the system to complete all sampling steps before any movement can start, forming the bottleneck in reaction latency. To overcome this issue, we propose Fast Action Sampling for ImmediaTE Reaction (FASTER). By introducing a Horizon-Aware Schedule, FASTER adaptively prioritizes near-term actions during flow sampling, compressing the denoising of the immediate reaction by tenfold (e.g., in $\pi_{0.5}$ and X-VLA) into a single step, while preserving the quality of long-horizon trajectory. Coupled with a streaming client-server pipeline, FASTER substantially reduces the effective reaction latency on real robots, especially when deployed on consumer-grade GPUs. Real-world experiments, including a highly dynamic table tennis task, prove that FASTER unlocks substantially improved real-time responsiveness for generalist policies, enabling rapid generation of accurate and smooth trajectories.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors

cs.RO · 2026-04-27 · unverdicted · novelty 7.0

Discrete diffusion policies support native asynchronous execution via unmasking for real-time chunking, delivering higher success rates and 0.7x inference cost versus flow-matching RTC on dynamic robotics benchmarks and real pick tasks.

DEFLECT: Delay-Robust Execution via Flow-matching Likelihood-Estimated Counterfactual Tuning for VLA Policies

cs.RO · 2026-05-19 · unverdicted · novelty 6.0

DEFLECT is an offline post-training method that improves async VLA policy success rates under high inference delays by using flow-matching likelihood ratios on counterfactual fresh/stale action pairs from a frozen reference policy.

LiteVLA-H: Dual-Rate Vision-Language-Action Inference for Onboard Aerial Guidance and Semantic Perception

cs.CV · 2026-04-27 · unverdicted · novelty 5.0 · 2 refs

LiteVLA-H delivers 19.74 Hz action tokens and 6 Hz semantic outputs on Jetson Orin via dual-rate scheduling and mixed fine-tuning, outperforming recent VLA baselines in edge action rate while preserving descriptive competence.

citing papers explorer

Showing 3 of 3 citing papers.

DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors cs.RO · 2026-04-27 · unverdicted · none · ref 24 · internal anchor
Discrete diffusion policies support native asynchronous execution via unmasking for real-time chunking, delivering higher success rates and 0.7x inference cost versus flow-matching RTC on dynamic robotics benchmarks and real pick tasks.
DEFLECT: Delay-Robust Execution via Flow-matching Likelihood-Estimated Counterfactual Tuning for VLA Policies cs.RO · 2026-05-19 · unverdicted · none · ref 20 · internal anchor
DEFLECT is an offline post-training method that improves async VLA policy success rates under high inference delays by using flow-matching likelihood ratios on counterfactual fresh/stale action pairs from a frozen reference policy.
LiteVLA-H: Dual-Rate Vision-Language-Action Inference for Onboard Aerial Guidance and Semantic Perception cs.CV · 2026-04-27 · unverdicted · none · ref 6 · 2 links · internal anchor
LiteVLA-H delivers 19.74 Hz action tokens and 6 Hz semantic outputs on Jetson Orin via dual-rate scheduling and mixed fine-tuning, outperforming recent VLA baselines in edge action rate while preserving descriptive competence.

FASTER: Rethinking Real-Time Flow VLAs

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer