FLASH: Efficient Visuomotor Policy via Sparse Sampling
Pith reviewed 2026-05-19 16:09 UTC · model grok-4.3
The pith
A visuomotor policy using Legendre polynomials and history-anchored flow matching generates long robot action sequences in a single fast step.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FLASH replaces discrete action-chunk generation with continuous Legendre polynomial trajectory representation by fitting expert demonstrations under sparse temporal sampling and initiating flow matching from history polynomial coefficients rather than uninformative Gaussian noise, enabling accurate single-step inference over extended action horizons with direct analytic velocity feed-forward.
What carries the argument
The Legendre polynomial trajectory representation combined with sparse history-anchored flow matching initialization, which reduces the generation to a single step while maintaining trajectory smoothness and accuracy.
If this is right
- State-of-the-art success rates of at least 92% across all tested tasks.
- Per-episode inference time of 31.40 ms, up to 175 times faster than diffusion policies and 18 times faster than prior flow matching policies.
- Up to 4 times faster training convergence compared to ACT.
- 5 to 7 times reduction in controller tracking error compared to discrete-action baselines.
Where Pith is reading between the lines
- Such history-anchored initialization might be applicable to other iterative generative methods to speed them up without changing the model architecture.
- This continuous representation could improve robustness in contact-rich tasks by providing smoother velocity commands.
- Deployment on edge devices becomes more feasible due to the low computational requirement per inference.
Load-bearing premise
That fitting expert demonstrations under sparse temporal sampling combined with initialization from history polynomial coefficients enables accurate single-step flow matching that preserves performance over extended action horizons without post-hoc tuning or task-specific adjustments.
What would settle it
Observing that on a held-out task or longer horizon, the single-step FLASH policy exhibits substantially lower success rates or higher tracking errors than a multi-step version of the same model would falsify the accuracy of the single-step inference claim.
read the original abstract
Generative models such as diffusion and flow matching have become dominant paradigms for visuomotor policy learning, yet their reliance on iterative denoising incurs high inference latency incompatible with real-time robotic control. We present Fast Legendre-polynomial Action policy via Sparse History-anchored flow (FLASH Policy), which replaces discrete action-chunk generation with continuous Legendre polynomial trajectory representation. Specifically, by fitting expert demonstrations under sparse temporal sampling, FLASH enables a single inference to cover a significantly extended action horizon. To further accelerate generation, FLASH initiates the flow matching process from history polynomial coefficients rather than uninformative Gaussian noise, shortening the transport distance and enabling accurate single-step inference. Moreover, analytic polynomial differentiation directly provides desired velocity feed-forward signals to the torque controller without numerical approximation. Extensive experiments on five simulated and two real-world manipulation tasks demonstrate that FLASH achieves state-of-the-art success rates ($\ge 92\%$ across all tasks), a per-episode inference time of $31.40\,ms$ (up to $175\times$ faster than diffusion policies and $18\times$ faster than prior flow matching policies), up to $4\times$ faster training convergence than ACT, and $5\times$ to $7\times$ reduction in controller tracking error compared to discrete-action baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces FLASH, a visuomotor policy for robotic manipulation that represents continuous action trajectories via Legendre polynomials fitted to sparsely sampled expert demonstrations. By anchoring flow-matching initialization to the resulting polynomial coefficients rather than Gaussian noise, the method enables single-step inference over extended horizons while supplying analytic velocity commands via polynomial differentiation. Experiments across five simulated and two real-world tasks report state-of-the-art success rates (>=92%), 31.40 ms per-episode inference (up to 175x faster than diffusion policies), up to 4x faster training than ACT, and 5-7x lower controller tracking error than discrete baselines.
Significance. If the single-step approximation proves robust, the work offers a practical route to real-time generative visuomotor control by removing iterative sampling latency while preserving performance. The analytic differentiation for feed-forward signals and the sparse-history initialization are concrete engineering advances. The multi-task empirical evaluation (simulation plus real hardware) is a strength that supports the speed and accuracy claims when properly ablated.
major comments (2)
- [Abstract] Abstract (method description): The load-bearing assumption that fitting Legendre polynomials to sparsely sampled trajectories and initializing flow matching from those coefficients yields accurate single-step generation over long horizons without bias or loss of high-frequency content is not supported by any analysis, ablation on sampling interval, or single-step vs. multi-step comparison. If the polynomial approximation is coarse on contact-rich or rapidly changing tasks, the reported >=92% success rates and tracking-error reductions could be undermined.
- [Abstract] Abstract (results): The specific performance numbers (31.40 ms inference, 175x/18x speedups, 4x faster convergence, 5-7x tracking error reduction) are presented without reference to tables, figures, run counts, variance, or statistical tests. This gap prevents verification that the empirical results robustly support the central claims of superiority over diffusion, flow-matching, and ACT baselines.
minor comments (3)
- [Abstract] Abstract: The free parameters (Legendre degree and sparse sampling interval) are mentioned but not characterized; a short statement on how they are selected or their sensitivity would improve clarity.
- [Abstract] Abstract: Clarify whether 'per-episode inference time' refers to a single action chunk or the full episode rollout, as this affects interpretation of the real-time claims.
- [Abstract] Title and abstract: The emphasis on 'Sparse Sampling' could be balanced with the history-anchored initialization, which appears to be the primary mechanism for shortening transport distance.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address the two major comments point by point below, indicating where revisions will be incorporated to strengthen the manuscript.
read point-by-point responses
-
Referee: The load-bearing assumption that fitting Legendre polynomials to sparsely sampled trajectories and initializing flow matching from those coefficients yields accurate single-step generation over long horizons without bias or loss of high-frequency content is not supported by any analysis, ablation on sampling interval, or single-step vs. multi-step comparison. If the polynomial approximation is coarse on contact-rich or rapidly changing tasks, the reported >=92% success rates and tracking-error reductions could be undermined.
Authors: We agree that additional explicit analysis and ablations would strengthen the presentation of the single-step approximation. While the multi-task empirical results (including contact-rich real-world tasks) already provide supporting evidence for robustness, we will revise the manuscript to include a new ablation subsection. This will vary the sparse sampling interval for Legendre fitting, report success rates and tracking errors across intervals, and directly compare single-step versus multi-step flow-matching inference on the same tasks. We will also add a brief discussion of the history-anchored initialization's role in reducing transport distance and potential bias. These changes will be supported by new figures and tables in the Experiments section. revision: yes
-
Referee: The specific performance numbers (31.40 ms inference, 175x/18x speedups, 4x faster convergence, 5-7x tracking error reduction) are presented without reference to tables, figures, run counts, variance, or statistical tests. This gap prevents verification that the empirical results robustly support the central claims of superiority over diffusion, flow-matching, and ACT baselines.
Authors: We concur that the abstract would benefit from explicit cross-references to the supporting empirical details. In the revised manuscript we will update the abstract to include parenthetical citations to the relevant tables and figures (e.g., Table 1 for success rates and inference latency, Figure 5 for training curves and tracking error). We will also state that all metrics are means over 5 random seeds with standard deviations reported, and note that pairwise comparisons include statistical significance via t-tests. The full experimental protocol, run counts, and variance are already detailed in Section 4; the abstract revision will make these connections immediate for readers. revision: yes
Circularity Check
No significant circularity; empirical results grounded in external task benchmarks
full rationale
The paper presents FLASH as an empirical engineering method that fits Legendre polynomials to sparsely sampled expert trajectories, initializes single-step flow matching from those coefficients, and obtains velocities via analytic differentiation. All reported metrics (≥92% success rates, 31.40 ms inference, 4× faster convergence, 5–7× tracking-error reduction) are obtained from experiments on five simulated and two real-world manipulation tasks. These are external benchmarks independent of the fitted parameters. No equation reduces a claimed performance quantity to a fitted input by construction, no load-bearing uniqueness theorem is imported via self-citation, and no ansatz is smuggled through prior work. The derivation chain is therefore self-contained against external validation and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
free parameters (2)
- Legendre polynomial degree
- sparse sampling interval
axioms (2)
- domain assumption Legendre polynomials can accurately represent robot action trajectories over extended horizons when fitted to sparse expert samples.
- domain assumption Starting flow matching from history polynomial coefficients shortens transport distance enough for accurate single-step inference.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
FLASH initiates the flow matching process from history polynomial coefficients rather than uninformative Gaussian noise, shortening the transport distance and enabling accurate single-step inference.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We adopt the shifted Legendre polynomials... orthogonal on [-1,1].
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Advances in Neural Information Processing Systems , volume=
Denoising diffusion probabilistic models , author=. Advances in Neural Information Processing Systems , volume=
-
[2]
Proceedings of International Conference on Learning Representations , year=
Flow Matching for Generative Modeling , author=. Proceedings of International Conference on Learning Representations , year=
-
[3]
Black, Kevin and Brown, Noah and Driess, Danny and Esmail, Adnan and Equi, Michael and Finn, Chelsea and Fusai, Niccolo and Groom, Lachy and Hausman, Karol and Ichter, Brian and others , journal=
-
[4]
The International Journal of Robotics Research , volume=
Diffusion policy: Visuomotor policy learning via action diffusion , author=. The International Journal of Robotics Research , volume=. 2025 , publisher=
work page 2025
-
[5]
Denoising Diffusion Implicit Models
Denoising diffusion implicit models , author=. arXiv preprint arXiv:2010.02502 , year=
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[6]
Proceedings of International Conference on Machine Learning , year=
Consistency models , author=. Proceedings of International Conference on Machine Learning , year=
-
[7]
Progressive Distillation for Fast Sampling of Diffusion Models
Progressive distillation for fast sampling of diffusion models , author=. arXiv preprint arXiv:2202.00512 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
On distillation of guided diffusion models , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[9]
Proceedings of European Conference on Computer Vision , pages=
Adversarial diffusion distillation , author=. Proceedings of European Conference on Computer Vision , pages=. 2024 , organization=
work page 2024
-
[10]
Mean Flows for One-step Generative Modeling
Mean flows for one-step generative modeling , author=. arXiv preprint arXiv:2505.13447 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[11]
arXiv preprint arXiv:2504.18904 , year=
Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning , author=. arXiv preprint arXiv:2504.18904 , year=
-
[12]
Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations , author=. arXiv preprint arXiv:2107.14483 , year=
-
[13]
IEEE Robotics and Automation Letters , volume=
Rlbench: The robot learning benchmark & learning environment , author=. IEEE Robotics and Automation Letters , volume=. 2020 , publisher=
work page 2020
-
[14]
Advances in Neural Information Processing Systems , volume=
Libero: Benchmarking knowledge transfer for lifelong robot learning , author=. Advances in Neural Information Processing Systems , volume=
-
[15]
Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
Scalable diffusion models with transformers , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
-
[16]
Score-Based Generative Modeling through Stochastic Differential Equations
Score-based generative modeling through stochastic differential equations , author=. arXiv preprint arXiv:2011.13456 , year=
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[17]
arXiv preprint arXiv:2509.18644 , year=
Do You Need Proprioceptive States in Visuomotor Policies? , author=. arXiv preprint arXiv:2509.18644 , year=
-
[18]
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
Learning fine-grained bimanual manipulation with low-cost hardware , author=. arXiv preprint arXiv:2304.13705 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[19]
Dechen Gao and BOQI ZHAO and Andrew Lee and Ian Chuang and Hanchu Zhou and Hang Wang and Zhe Zhao and Junshan Zhang and Iman Soltani , booktitle=
- [20]
- [21]
-
[22]
Proceedings of IEEE International Conference on Robotics and Automation , pages=
Minimum snap trajectory generation and control for quadrotors , author=. Proceedings of IEEE International Conference on Robotics and Automation , pages=. 2011 , organization=
work page 2011
-
[23]
Robotics Research: The 16th International Symposium ISRR , pages=
Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments , author=. Robotics Research: The 16th International Symposium ISRR , pages=. 2016 , organization=
work page 2016
-
[24]
Handbook of mathematical functions with formulas, graphs, and mathematical tables , author=. 1948 , publisher=
work page 1948
- [25]
-
[26]
Mathematical Handbook of Formulas and Tables , author=. 1968 , publisher=
work page 1968
-
[27]
Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems , pages=
Flowmp: Learning motion fields for robot planning with conditional flow matching , author=. Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems , pages=. 2025 , organization=
work page 2025
-
[28]
arXiv preprint arXiv:2506.06072 , year=
Beast: Efficient tokenization of b-splines encoded action sequences for imitation learning , author=. arXiv preprint arXiv:2506.06072 , year=
-
[29]
IEEE Transactions on Robotics , year=
Motion planning diffusion: Learning and adapting robot motion planning with diffusion models , author=. IEEE Transactions on Robotics , year=
-
[30]
arXiv preprint arXiv:2602.06698 , year=
Crowd-FM: Learned Optimal Selection of Conditional Flow Matching-generated Trajectories for Crowd Navigation , author=. arXiv preprint arXiv:2602.06698 , year=
-
[31]
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
Deep residual learning for image recognition , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
-
[32]
Statistical power analysis for the behavioral sciences , author=. 2013 , publisher=
work page 2013
-
[33]
Action-to-Action Flow Matching
Action-to-action flow matching , author=. arXiv preprint arXiv:2602.07322 , year=
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.