pith. sign in

arxiv: 2605.16736 · v2 · pith:E57JTNLKnew · submitted 2026-05-16 · 💻 cs.CV

CAB: Accelerating Flow and Diffusion Sampling via Rectification and Corrected Adams-Bashforth

Pith reviewed 2026-05-20 15:56 UTC · model grok-4.3

classification 💻 cs.CV
keywords flow modelsdiffusion modelssampling accelerationAdams-Bashforthimage synthesistraining-free samplerrectified coordinateslow-step sampling
0
0 comments X

The pith

CAB accelerates sampling in flow and diffusion models by rectifying dynamics and applying a corrected Adams-Bashforth procedure without extra training or evaluations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CAB as a training-free sampler that speeds up image generation from pretrained flow and diffusion models. It first maps the sampling process into a shared rectified coordinate system and then uses a multistep Adams-Bashforth predictor with a correction term based on prior velocity values. This setup targets better quality at low numbers of function evaluations, particularly in the 6 to 20 step range, across class-conditional and large-scale text-to-image tasks. A sympathetic reader would care because fewer steps make these generative models more usable in practical settings where computation time or resources are limited.

Core claim

CAB transforms the sampling dynamics of both flow and diffusion models into a common rectified coordinate system and then applies a multistep Adams-Bashforth predictor augmented with a correction term derived from past velocity evaluations. This procedure incurs no extra function evaluations, maintains the same algorithmic form across model types, and achieves at least third-order local truncation error along with second-order global error. Experiments on pretrained models show improved quality versus NFE trade-offs in the low-step regime while staying competitive at higher step counts.

What carries the argument

The rectified coordinate system paired with the corrected Adams-Bashforth procedure, which unifies acceleration across flow and diffusion models by enabling a single multistep solver to be applied uniformly.

If this is right

  • Improved sample quality at 6-20 NFEs on both class-conditional and large-scale text-to-image benchmarks.
  • Competitive performance against other training-free samplers when using higher step counts across most tested models.
  • Uniform algorithmic form that applies identically to flow and diffusion models without model-specific changes.
  • At least third-order local truncation error and second-order global error in the numerical integration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • A single sampler implementation could serve multiple generative model families and thereby simplify deployment codebases.
  • Reduced step counts at maintained quality could support interactive or on-device image generation where latency matters.
  • The velocity-based correction approach might extend to accelerating other ODE-based processes outside image synthesis.

Load-bearing premise

Transforming the sampling dynamics to a common rectified coordinate system allows the same corrected Adams-Bashforth procedure to be applied uniformly to both flow and diffusion models without introducing model-specific degradation or requiring additional tuning.

What would settle it

Running CAB on a large-scale text-to-image model at exactly 10 NFEs and observing that the generated images yield worse or equal FID scores compared to a standard second-order solver such as Heun would falsify the claimed quality improvement in the low-step regime.

Figures

Figures reproduced from arXiv: 2605.16736 by Anuska Roy, Pravin Nair.

Figure 1
Figure 1. Figure 1: Qualitative comparison of training-free samplers on QWEN-Image [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FID versus NFE for CIFAR-10 with VP/VE schedules and ImageNet with the VP schedule. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 2
Figure 2. Figure 2: FID versus NFE for CIFAR-10 with VP/VE schedules and ImageNet with the VP schedule. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of training-free samplers on 256 × 256 class-conditional ImageNet generation. the roles of rectification and correction, the effect of the corrector weight γ on distributional and perceptual quality, runtime, and memory cost. Additional results, such as empirical verification of Theorem 3.2, additional comparisons, and limitations, are deferred to the Appendix. Unconditional image generation on … view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of training-free samplers on QWEN-Image [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ablation of the CAB correction weight γ on DiT/ImageNet 256 × 256. Stronger correction improves low-NFE FID, while moderate correction yields better NIQE [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Empirical verification of the accuracy results in Theorem 3.2 on two representative nonlinear [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
Figure 7
Figure 7. Figure 7: Empirical verification of the accuracy results in Theorem 3.2 on two representative nonlinear [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of AB2/AB3 and the proposed CAB-2/CAB-3 on two representative nonlinear [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of AB2/AB3 and the proposed CAB-2/CAB-3 on two representative nonlinear [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative comparison of samplers in the 8-NFE regime for DiT. Increasing the solver order [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative comparison of samplers in the 8-NFE regime for DiT. Increasing the solver order [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Unconditional generation on CIFAR-10 using EDM model. [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗
Figure 10
Figure 10. Figure 10: Unconditional generation on CIFAR-10 using EDM model. [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of samplers on class-conditional ImageNet ( [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of samplers on class-conditional ImageNet ( [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Comparison of training-free samplers on QWEN-Image [PITH_FULL_IMAGE:figures/full_fig_p025_12.png] view at source ↗
Figure 12
Figure 12. Figure 12: Comparison of training-free samplers on QWEN-Image [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Prompt: “light wind, feathers moving, she moves her gaze, 4k.” Temporal comparison of training-free samplers on HunyuanVideo-1.5 over four frames from the same generated video. CAB-2 better preserves appearance consistency and smoother motion progression, while DPM++ and STORK show stronger temporal drift and frame-to-frame variation. Frame 1 Frame 2 Frame 3 Frame 4 DPM++ STORK CAB-2 [PITH_FULL_IMAGE:fig… view at source ↗
Figure 13
Figure 13. Figure 13: Prompt: “light wind, feathers moving, she moves her gaze, 4k.” Temporal comparison of training-free samplers on HunyuanVideo-1.5 over four frames from the same generated video. CAB-2 better preserves appearance consistency and smoother motion progression, while DPM++ and STORK show stronger temporal drift and frame-to-frame variation. Frame 1 Frame 2 Frame 3 Frame 4 DPM++ STORK CAB-2 [PITH_FULL_IMAGE:fig… view at source ↗
Figure 14
Figure 14. Figure 14: Prompt: “A fluffy grey and white cat is lazily stretched out on a sunny window sill, enjoying a nap after a long day of lounging.” Temporal comparison of training-free samplers on HunyuanVideo-1.5 over four frames from the same generated video. CAB-2 preserves sharper fur texture, clearer facial structure, and more consistent window-side lighting, while DPM++ and STORK appear blurrier and show weaker appe… view at source ↗
Figure 14
Figure 14. Figure 14: Prompt: “A fluffy grey and white cat is lazily stretched out on a sunny window sill, enjoying a nap after a long day of lounging.” Temporal comparison of training-free samplers on HunyuanVideo-1.5 over four frames from the same generated video. CAB-2 preserves sharper fur texture, clearer facial structure, and more consistent window-side lighting, while DPM++ and STORK appear blurrier and show weaker appe… view at source ↗
Figure 15
Figure 15. Figure 15: Prompt: “a giraffe eating an apple.” Temporal comparison of training-free samplers on HunyuanVideo-1.5 over four frames from the same generated video. CAB-2 preserves cleaner appearance and more coherent scene context across frames, while DPM++ and STORK show weaker texture consistency and less stable background structure. (a) N = 6 (b) N = 20 (c) N = 50 [PITH_FULL_IMAGE:figures/full_fig_p027_15.png] view at source ↗
Figure 15
Figure 15. Figure 15: Prompt: “a giraffe eating an apple.” Temporal comparison of training-free samplers on HunyuanVideo-1.5 over four frames from the same generated video. CAB-2 preserves cleaner appearance and more coherent scene context across frames, while DPM++ and STORK show weaker texture consistency and less stable background structure [PITH_FULL_IMAGE:figures/full_fig_p026_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Trajectories produced by AB2, AB3, CAB-2, and CAB-3, compared with the reference [PITH_FULL_IMAGE:figures/full_fig_p027_16.png] view at source ↗
read the original abstract

Flow and diffusion models achieve high-fidelity, high-resolution image synthesis, but often require many function evaluations (NFEs) at sampling time. Existing acceleration methods either require additional training through distillation or rely on training-free high-order solvers, and both can degrade sample quality at low NFE budgets. We propose CAB (Corrected Adams-Bashforth), a training-free sampler that accelerates both flow and diffusion models. CAB first transforms the sampling dynamics to a common rectified coordinate system, and then applies a multistep Adams-Bashforth predictor augmented with a simple correction term based on past velocity evaluations and therefore incurs no additional NFEs. The resulting method is simple, has the same algorithmic form across model classes, and has at least third-order local truncation error and second-order global error. Experiments on pretrained flow and diffusion models, including class-conditional and large-scale text-to-image benchmarks, show that CAB improves quality-NFE trade-offs in the low-step regime of 6-20 NFEs. It also remains competitive with strong training-free samplers at higher step counts across most tested models. The official implementation is available at https://github.com/Anuska-Roy/CAB.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces CAB, a training-free sampler for accelerating both flow and diffusion models. It first rectifies sampling trajectories into a common coordinate system, then applies a multistep Corrected Adams-Bashforth predictor that uses past velocity evaluations (no extra NFEs) and is claimed to achieve at least third-order local truncation error and second-order global error. Experiments on pretrained class-conditional and large-scale text-to-image models show improved quality-NFE trade-offs for 6-20 NFEs while remaining competitive at higher step counts.

Significance. If the rectification unifies the dynamics sufficiently for the claimed error order to hold uniformly and the empirical gains are robust to fair baseline matching, CAB would provide a simple, reproducible acceleration technique applicable to both model families without distillation or per-model tuning. The public implementation at https://github.com/Anuska-Roy/CAB is a positive factor for reproducibility.

major comments (2)
  1. [Methods] Methods section (derivation of the correction term and local truncation error): the third-order claim assumes that rectification eliminates residual nonlinear drift terms arising from diffusion variance schedules. For linear or cosine schedules, an O(Δt²) remainder may persist after any affine rectification, which would invalidate the fixed correction coefficient and reduce the actual order; no explicit Taylor expansion or verification for standard schedules is supplied to confirm the assumption.
  2. [Experiments] Experimental section (low-NFE regime, 6-20 NFEs): the reported quality gains rely on the rectification step preserving the smoothness and bounded-derivative conditions needed for the Adams-Bashforth analysis. Without controls that isolate the rectification effect (e.g., comparing rectified vs. non-rectified CAB on the same diffusion model), it is unclear whether the gains are attributable to the claimed order or to incidental trajectory straightening.
minor comments (2)
  1. [Abstract] The abstract states 'at least third-order local truncation error'; the precise order and the exact form of the correction coefficient should be stated explicitly with the full expansion in the main text rather than left to the appendix.
  2. [Methods] Notation for the rectified coordinate system and the velocity field after rectification should be introduced with a clear equation early in the Methods section to avoid ambiguity when the same CAB procedure is applied to both flow and diffusion models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. The comments help clarify the presentation of the theoretical claims and the attribution of empirical gains. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Methods] Methods section (derivation of the correction term and local truncation error): the third-order claim assumes that rectification eliminates residual nonlinear drift terms arising from diffusion variance schedules. For linear or cosine schedules, an O(Δt²) remainder may persist after any affine rectification, which would invalidate the fixed correction coefficient and reduce the actual order; no explicit Taylor expansion or verification for standard schedules is supplied to confirm the assumption.

    Authors: We appreciate the referee drawing attention to the need for a more explicit error analysis. The rectification is constructed as an affine transformation chosen to align the integrated velocity fields of flow and diffusion models into a common coordinate system in which the leading nonlinear contributions from the variance schedule are removed or pushed to higher order. Nevertheless, we agree that the manuscript would benefit from a self-contained Taylor expansion of the rectified dynamics for the linear and cosine schedules used in our experiments. In the revised version we will insert this derivation, explicitly showing the order of the residual term after rectification and confirming that the fixed correction coefficient preserves the third-order local truncation error. We will also add a short numerical check of the observed convergence rate on a simple ODE with the same schedules. revision: yes

  2. Referee: [Experiments] Experimental section (low-NFE regime, 6-20 NFEs): the reported quality gains rely on the rectification step preserving the smoothness and bounded-derivative conditions needed for the Adams-Bashforth analysis. Without controls that isolate the rectification effect (e.g., comparing rectified vs. non-rectified CAB on the same diffusion model), it is unclear whether the gains are attributable to the claimed order or to incidental trajectory straightening.

    Authors: We concur that an explicit ablation isolating the rectification step would make the source of the low-NFE improvements clearer. Although CAB is presented as an integrated procedure in which rectification is a prerequisite for applying the corrected multistep rule, we will add a controlled comparison in the revised experimental section: on the same pretrained diffusion models we will report results for (i) the full CAB pipeline, (ii) the corrected Adams-Bashforth predictor applied directly in the original coordinates (i.e., without rectification), and (iii) a standard Adams-Bashforth baseline. These additional curves will allow readers to separate the contribution of rectification from the multistep correction itself. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation applies standard numerical methods after coordinate transformation

full rationale

The paper presents CAB as the composition of a rectification step that maps flow and diffusion dynamics into a shared coordinate system followed by a corrected Adams-Bashforth multistep integrator whose local truncation error order is taken from the classical analysis of Adams-Bashforth schemes. No central quantity is defined in terms of itself, no parameter is fitted inside the paper and then relabeled as a prediction, and no load-bearing premise rests on a self-citation whose validity is presupposed. The algorithmic form and error claims are therefore independent of the paper's own experimental outputs or prior author-specific results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that rectification produces a common dynamics amenable to the same multistep solver across model families, plus standard numerical analysis results for Adams-Bashforth methods.

axioms (1)
  • domain assumption Sampling dynamics of flow and diffusion models can be transformed to a common rectified coordinate system without loss of the target distribution or introduction of model-specific artifacts.
    This premise is required to justify applying the identical CAB procedure to both model classes.

pith-pipeline@v0.9.0 · 5737 in / 1312 out tokens · 59041 ms · 2026-05-20T15:56:43.013518+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 2 internal anchors

  1. [1]

    Denoising diffusion probabilistic models.Proc

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Proc. Advances in neural information processing systems, 33:6840–6851, 2020

  2. [2]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling.Proc. International Conference on Learning Representations, 2023

  3. [3]

    Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J. Fleet. Video diffusion models.Proc. Advances in neural information processing systems, 35:8633–8646, 2022

  4. [4]

    Diffusion model-based image editing: A survey

    Yi Huang, Jiancheng Huang, Yifan Liu, Mingfu Yan, Jiaxi Lv, Jianzhuang Liu, Wei Xiong, He Zhang, Shifeng Chen, and Liangliang Cao. Diffusion model-based image editing: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(6):4409–4437, 2025

  5. [5]

    Lee, Jonathan Ho, Tim Salimans, David J

    Chitwan Saharia, William Chan, Huiwen Chang, Chris A. Lee, Jonathan Ho, Tim Salimans, David J. Fleet, and Mohammad Norouzi. Palette: Image-to-image diffusion models.NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021

  6. [6]

    Repaint: Inpainting using denoising diffusion probabilistic models.Proc

    Andreas Lugmayr, Martin Danelljan, Andrés Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. Repaint: Inpainting using denoising diffusion probabilistic models.Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

  7. [7]

    Diffusion models for image restoration and enhancement: A comprehensive survey

    Xin Li, Yulin Ren, Xin Jin, Cuiling Lan, Xingrui Wang, Wenjun Zeng, Xinchao Wang, and Zhibo Chen. Diffusion models for image restoration and enhancement: A comprehensive survey. International Journal of Computer Vision, 133(11):8078–8108, 2025

  8. [8]

    Survey of video diffusion models: Foundations, implementations, and applications.Transactions on Machine Learning Research, 2025

    Yimu Wang, Xuye Liu, Wei Pang, Li Ma, Shuai Yuan, Paul Debevec, and Ning Yu. Survey of video diffusion models: Foundations, implementations, and applications.Transactions on Machine Learning Research, 2025. ISSN 2835-8856

  9. [9]

    Score-based generative modeling through stochastic differential equations.Proc

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.Proc. International Conference on Learning Representations, 2021

  10. [10]

    Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps.Proc

    Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps.Proc. Advances in neural information processing systems, 35:5775–5787, 2022

  11. [11]

    Denoising diffusion implicit models.Proc

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.Proc. International Conference on Learning Representations, 2021

  12. [12]

    Unipc: A unified predictor- corrector framework for fast sampling of diffusion models.Proc

    Wenliang Zhao, Lujia Bai, Yongming Rao, Jie Zhou, and Jiwen Lu. Unipc: A unified predictor- corrector framework for fast sampling of diffusion models.Proc. Advances in Neural Informa- tion Processing Systems, 36:49842–49869, 2023

  13. [13]

    Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models.Machine Intelligence Re- search, pages 1–22, 2025

    Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models.Machine Intelligence Re- search, pages 1–22, 2025. 10

  14. [14]

    Pseudo numerical methods for diffusion models on manifolds.Proc

    Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. Pseudo numerical methods for diffusion models on manifolds.Proc. International Conference on Learning Representations, 2022

  15. [15]

    Fast sampling of diffusion models with exponential integrator.Proc

    Qinsheng Zhang and Yongxin Chen. Fast sampling of diffusion models with exponential integrator.Proc. International Conference on Learning Representations, 2023

  16. [16]

    Dpm-solver-v3: Improved diffusion ode solver with empirical model statistics.Proc

    Kaiwen Zheng, Cheng Lu, Jianfei Chen, and Jun Zhu. Dpm-solver-v3: Improved diffusion ode solver with empirical model statistics.Proc. Advances in Neural Information Processing Systems, 36:55502–55542, 2023

  17. [17]

    Bertozzi, and Ernest K

    Zheng Tan, Weizhen Wang, Andrea L. Bertozzi, and Ernest K. Ryu. STORK: Faster diffusion and flow matching sampling by resolving both stiffness and structure-dependence.Proc. International Conference on Learning Representations, 2026

  18. [18]

    Qwen-Image Technical Report

    Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng-ming Yin, Shuai Bai, Xiao Xu, Yilei Chen, et al. Qwen-image technical report.arXiv preprint arXiv:2508.02324, 2025

  19. [19]

    Scalable diffusion models with transformers.Proc

    William Peebles and Saining Xie. Scalable diffusion models with transformers.Proc. IEEE/CVF International Conference on Computer Vision, pages 4195–4205, 2023

  20. [20]

    Neural ordinary differential equations.Proc

    Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations.Proc. Advances in Neural Information Processing Systems, 31, 2018

  21. [21]

    Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, and David Duvenaud. Ffjord: Free-form continuous dynamics for scalable reversible generative models.Proc. International Conference on Learning Representations, 2019

  22. [22]

    Elucidating the design space of diffusion-based generative models.Proc

    Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models.Proc. Advances in neural information processing systems, 35:26565–26577, 2022

  23. [23]

    Flow straight and fast: Learning to generate and transfer data with rectified flow.Proc

    Xingchao Liu, Chengyue Gong, and qiang liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.Proc. International Conference on Learning Representations, 2023

  24. [24]

    Variational diffusion models

    Diederik Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models. Proc. Advances in neural information processing systems, 34:21696–21707, 2021

  25. [25]

    Progressive distillation for fast sampling of diffusion models

    Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. Proc. International Conference on Learning Representations, 2022

  26. [26]

    Scale-wise distillation of diffusion models.Proc

    Nikita Starodubcev, Ilya Drobyshevskiy, Denis Kuznedelev, Artem Babenko, and Dmitry Baranchuk. Scale-wise distillation of diffusion models.Proc. International Conference on Learning Representations, 2026

  27. [27]

    Consistency models.Proc

    Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models.Proc. International Conference on Machine Learning, pages 32211–32252, 2023

  28. [28]

    Simplifying, stabilizing and scaling continuous-time consistency models.Proc

    Cheng Lu and Yang Song. Simplifying, stabilizing and scaling continuous-time consistency models.Proc. International Conference on Learning Representations, 2025

  29. [29]

    Instaflow: One step is enough for high-quality diffusion-based text-to-image generation.Proc

    Xingchao Liu, Xiwen Zhang, Jianzhu Ma, Jian Peng, and Qiang liu. Instaflow: One step is enough for high-quality diffusion-based text-to-image generation.Proc. International Confer- ence on Learning Representations, 2024

  30. [30]

    Freeman, and Taesung Park

    Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Frédo Durand, William T. Freeman, and Taesung Park. One-step diffusion with distribution matching distillation.Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6613–6623, 2024

  31. [31]

    Sana-sprint: One-step diffusion with continuous-time consistency distillation.Proc

    Junsong Chen, Shuchen Xue, Yuyang Zhao, Jincheng Yu, Sayak Paul, Junyu Chen, Han Cai, Song Han, and Enze Xie. Sana-sprint: One-step diffusion with continuous-time consistency distillation.Proc. IEEE/CVF International Conference on Computer Vision, pages 16185–16195, 2025. 11

  32. [32]

    Sa-solver: Stochastic adams solver for fast sampling of diffusion models.Proc

    Shuchen Xue, Mingyang Yi, Weijian Luo, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, and Zhi-Ming Ma. Sa-solver: Stochastic adams solver for fast sampling of diffusion models.Proc. Advances in Neural Information Processing Systems, 36:77632–77674, 2023

  33. [33]

    Restart sampling for improving generative processes.Proc

    Yilun Xu, Mingyang Deng, Xiang Cheng, Yonglong Tian, Ziming Liu, and Tommi Jaakkola. Restart sampling for improving generative processes.Proc. Advances in Neural Information Processing Systems, 36:76806–76838, 2023

  34. [34]

    Neta Shaul, Juan Perez, Ricky T. Q. Chen, Ali Thabet, Albert Pumarola, and Yaron Lipman. Bespoke solvers for generative flow models.Proc. International Conference on Learning Representations, 2024

  35. [35]

    Ratliff, and Sewoong Oh

    Eric Frankel, Sitan Chen, Jerry Li, Pang Wei Koh, Lillian J. Ratliff, and Sewoong Oh. S4s: Solving for a fast diffusion model solver.Proc. International Conference on Machine Learning, 2025

  36. [36]

    Nørsett, and Gerhard Wanner.Solving Ordinary Differential Equations I: Nonstiff Problems

    Ernst Hairer, Syvert P. Nørsett, and Gerhard Wanner.Solving Ordinary Differential Equations I: Nonstiff Problems. Springer, 2 edition, 1993

  37. [37]

    J. C. Butcher. Numerical methods for ordinary differential equations in the 20th century.Journal of Computational and Applied Mathematics, 125(1–2):1–29, 2000

  38. [38]

    Ascher and Linda R

    Uri M. Ascher and Linda R. Petzold.Computer Methods for Ordinary Differential Equations and Differential-Algebraic Equations. SIAM, 1998

  39. [39]

    Böhm and H

    C. Böhm and H. J. Stetter. The defect correction approach.Computing, 32(1):3–22, 1984

  40. [40]

    Ong and Raymond J

    Benjamin W. Ong and Raymond J. Spiteri. Deferred correction methods for ordinary differential equations.International Journal of Computer Mathematics, 97(1–2):378–398, 2020

  41. [41]

    Imagereward: Learning and evaluating human preferences for text-to-image generation

    Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: Learning and evaluating human preferences for text-to-image generation. Proc. Advances in Neural Information Processing Systems, 36:15903–15935, 2023

  42. [42]

    HunyuanVideo: A Systematic Framework For Large Video Generative Models

    Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, et al. Hunyuanvideo: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603, 2024

  43. [43]

    Evalcrafter: Benchmarking and evaluating large video generation models.Proc

    Yaofang Liu, Xiaodong Cun, Xuebo Liu, Xintao Wang, Yong Zhang, Haoxin Chen, Yang Liu, Tieyong Zeng, Raymond Chan, and Ying Shan. Evalcrafter: Benchmarking and evaluating large video generation models.Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22139–22149, 2024

  44. [44]

    Learning transferable visual models from natural language supervision.Proc

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision.Proc. International Conference on Machine Learning, pages 8748–8763, 2021

  45. [45]

    Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation.Proc

    Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation.Proc. International Conference on Machine Learning, pages 12888–12900, 2022

  46. [46]

    Two men standing side by side and smiling

    Jan Verschelde. Variable step methods, 2022. Lecture notes for MCS 471, University of Illinois Chicago. A Theoretical results and additional experiments A.1 Proof of Lemma A.1 Starting from the reverse-time ODE in (3), dxt dt = ˙st st xt + ˙σt − ˙st st σt ϵθ(xt, t).(10) 12 Consider the rescaled state. yt := xt st ,so thatx t =s tyt. Differentiatingx t =s ...