Restoring Initial Noise Sensitivity in Text-to-Image Distillation via Geometric Alignment

Daiguo Zhou; Huayang Huang; Jian Luan; Jinhui Zhao; Ruoyu Wang; Wei Deng; Ye Zhu; Yu Wu

arxiv: 2606.01651 · v1 · pith:TYZI5EKPnew · submitted 2026-06-01 · 💻 cs.CV

Restoring Initial Noise Sensitivity in Text-to-Image Distillation via Geometric Alignment

Huayang Huang , Ruoyu Wang , Jinhui Zhao , Wei Deng , Daiguo Zhou , Jian Luan , Yu Wu , Ye Zhu This is my paper

Pith reviewed 2026-06-28 15:18 UTC · model grok-4.3

classification 💻 cs.CV

keywords text-to-image distillationnoise sensitivitygeometric alignmentJacobian-vector productsdiffusion modelsgenerative distillationimage generation

0 comments

The pith

Matching Jacobian-vector products restores initial noise sensitivity lost in standard text-to-image distillation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard distillation aligns teacher and student outputs at individual points, which flattens the mapping from initial noise to final image and reduces how much small noise changes affect the result. This loss hurts any downstream method that tunes or manipulates the starting noise. The paper shows that the flattening comes from the pointwise objective itself and introduces Geometry-Aware Distillation to match the local slope instead. By aligning Jacobian-vector products with respect to the input noise, the student reproduces the teacher's differential response to perturbations. Experiments across diffusion and other T2I setups confirm restored sensitivity, higher output diversity, and unchanged visual quality.

Core claim

Standard distillation objectives enforce pointwise output alignment and thereby suppress the teacher's local geometric structure around the input noise; Geometry-Aware Distillation restores the missing sensitivity by explicitly matching Jacobian-vector products with respect to input noise so the student reproduces the teacher's differential response to perturbations while preserving perceptual fidelity.

What carries the argument

Geometry-Aware Distillation (GAD), which aligns local functional behavior of teacher and student models by matching their Jacobian-vector products with respect to input noise.

If this is right

Distilled student models regain support for noise-based optimization and manipulation techniques used in control tasks.
Generated outputs exhibit greater diversity than those from pointwise distillation.
High visual fidelity is maintained across multiple text-to-image generation paradigms.
Downstream noise-driven control tasks show measurable performance gains without retraining the teacher.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same geometric-alignment idea may apply to distilling other generative models where input perturbations carry semantic meaning.
Practitioners could combine GAD with existing noise-optimization pipelines to obtain both speed and controllable variation in one model.
Future distillation objectives might need to preserve additional local properties beyond first-order Jacobians to stay faithful to the teacher trajectory.

Load-bearing premise

The loss of noise sensitivity stems primarily from pointwise output alignment in standard distillation, and matching Jacobian-vector products will restore it without compromising fidelity or creating new problems.

What would settle it

Measure output variation under controlled small perturbations to the initial noise in a standard distilled model versus a GAD model; if the GAD model shows variation closer to the teacher while FID or perceptual scores remain comparable, the claim is supported.

Figures

Figures reproduced from arXiv: 2606.01651 by Daiguo Zhou, Huayang Huang, Jian Luan, Jinhui Zhao, Ruoyu Wang, Wei Deng, Ye Zhu, Yu Wu.

**Figure 1.** Figure 1: Illustration of sensitivity degradation in diffusion distillation. Top: While the Teacher (a) maps noise to distinct modes (green/blue clusters) with clear directional gradients (arrows), Standard Distillation (b) tends to average these modes, resulting in misaligned gradients. Our Geometry-Aware Distillation (c) successfully recovers the teacher’s geometric structure. (d) Trajectory view: standard point… view at source ↗

**Figure 2.** Figure 2: Geometric gap in distillation. Comparison between baseline TDM (Blue) and our method (Orange). While the baseline achieves comparable pointwise MSE to our method (a), it suffers more from high geometric error (b) and attenuated variations to input perturbations (c). teacher ΦT (z) for individual inputs. However, this objective treats inputs z independently and does not explicitly constrain the functional … view at source ↗

**Figure 3.** Figure 3: Overview of Geometry-Aware Distillation (GAD). (a) Existing distillation paradigms typically focus on individual pointwise alignment, which often leads the student to learn an “averaged” direction between ΦT (z) and ΦT (z ′ ), thus resulting in a flattened response and loss of diversity. (b) Our GAD complements the standard loss (dashed) by aligning paired inputs (z, z ′ ) to align the Response Vectors. By… view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of layout control. The left column shows the target bounding boxes. The text prompts are “A horse and a boat.” (first row) and “A cow and a suitcase.” (second row) [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Diversity vs. fidelity trade-off. Vendi Score (Diversity) vs. CLIP Score across three architectures. Baseline methods (grey) exhibit a clear trade-off, whereas our method (red) consistently lies in the upper-right region close to the Teacher (blue). 2024b), LCM (Luo et al., 2023), YOSO (Luo et al., 2025a), FLASH (Chadebec et al., 2025), and TDM (Luo et al., 2025b). For SANA, we utilize the SiD (Zhou et al.… view at source ↗

**Figure 6.** Figure 6: Visualization of diversity and low-level control. (a) Generated images of baseline distilled models (SiD) (Zhou et al., 2025b) and ours under the same set of initial noises. (b) Zero-shot control via NoiseQuery (Wang et al., 2025): retrieving noise for “Blue Hue” and “High Brightness” from the teacher [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Ablation study on the perturbation scale h. Training curves on PixArt-α for Pickscore (left), CLIP Score (middle), and Intra-prompt LPIPS (right). The results indicate that very small h values fail to restore diversity (LPIPS), while h = 10−2 (grey) achieves an optimal equilibrium between structural sensitivity and generation quality. Sensitivity to Weighted Parameter λ. We further analyze the balance betw… view at source ↗

**Figure 8.** Figure 8: Ablation study on the weighting parameter λ. Training curves on PixArt-α for Pickscore (left), CLIP Score (middle), and Intra-prompt LPIPS (right). Metrics are computed online during training on a subset of 50 COCO prompts. The results highlight a trade-off: high λ values slightly compromise fidelity scores, while low λ values lead to diminished LPIPS (diversity), with λ = 1.0 yielding the most balanced pe… view at source ↗

**Figure 9.** Figure 9: A Swiss Roll toy example visualizing the restoration of geometry. Left to right: Ground truth training data, Teacher model (40-step DDIM), standard Student (4-step), and our GAD Student (4-step). Standard distillation leads to structural “shortcuts” (red boxes) across the complex curves, causing severe distribution shifts. In contrast, GAD accurately preserves the teacher’s original geometry and manifold c… view at source ↗

**Figure 10.** Figure 10: More visualization of diversity improvement. The experiments are conducted on the SANA model using the Score Identity Distillation (SiD) as the foundational distillation framework. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: More visualization of noise-based layout control. The experiments are conducted on the Stable Diffusion v2 (SD2) model using LADD as the foundational distillation framework. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗

read the original abstract

Generative distillation significantly accelerates text-to-image (T2I) generation by compressing multi-step trajectories into few-step student models while preserving perceptual quality. However, existing methods primarily optimize efficiency and output fidelity, often neglecting critical properties of the original trajectory. In this work, we identify a key missing property: sensitivity to initial noise, whose degradation impairs downstream control methods relying on noise-based optimization and manipulation. We trace this issue to standard distillation objectives that enforce pointwise output alignment, inadvertently flattening the input-output landscape and suppressing the teacher's local geometric structure. To address this, we propose Geometry-Aware Distillation (GAD), a sensitivity-preserving framework that aligns the local functional behavior of teacher and student models. Specifically, GAD matches Jacobian-vector products with respect to input noise, enabling the student to reproduce the teacher's differential response to perturbations. Extensive experiments across multiple T2I paradigms and noise-driven control tasks demonstrate that GAD significantly restores sensitivity and improves diversity while maintaining high visual fidelity. Code is available at https://github.com/Hannah1102/GAD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

JVP matching targets a real gap in distillation but the stress-test concern about few-step transfer looks like it needs direct evidence.

read the letter

The one thing to know is that the paper frames the loss of initial-noise sensitivity as a side effect of pointwise output matching in distillation and proposes Geometry-Aware Distillation that instead aligns Jacobian-vector products with respect to the input noise. That is the concrete new piece.

The work does a clean job of naming the downstream problem: many control methods rely on the teacher’s differential response to small noise changes, and standard distillation appears to flatten that response. Matching the first-order behavior via JVPs is a direct way to push the student to reproduce the local geometry rather than just the output value. The abstract states that experiments across several T2I backbones and noise-driven tasks show restored sensitivity plus better diversity at comparable fidelity, and the code link is provided.

The stress-test note is on target. Once the student is trained for few-step sampling it realizes a different composition of the diffusion process, so agreement of JVPs at the training points does not automatically guarantee that the effective sensitivity survives the reduced trajectory. The abstract supplies no derivation or ablation that shows why first-order matching at sampled points is sufficient after the step reduction. Without the actual numbers, measurement protocol for sensitivity, or controls for that transfer, it is hard to judge how much of the claimed restoration is robust.

This is for people who build or use distilled T2I models for downstream noise-based editing or optimization. A reader already working in that area would find the method worth trying if the experiments hold. It is coherent on its own terms and engages the literature honestly, so it deserves a serious referee who can check the quantitative claims and the few-step transfer argument.

Referee Report

2 major / 0 minor

Summary. The manuscript argues that standard pointwise output alignment in text-to-image distillation flattens the input-output landscape and suppresses sensitivity to initial noise, impairing downstream noise-driven control tasks. It proposes Geometry-Aware Distillation (GAD), which aligns teacher and student models by matching Jacobian-vector products (JVPs) with respect to input noise so that the student reproduces the teacher's local differential response to perturbations. The abstract claims that this geometric matching restores sensitivity and improves diversity while preserving visual fidelity, supported by experiments across multiple T2I paradigms and control tasks.

Significance. If the central claim is substantiated, the work would be significant for the distillation literature because it isolates and targets a functional property (noise sensitivity) that is orthogonal to perceptual fidelity yet critical for control applications. The JVP-matching formulation offers a concrete geometric mechanism that could generalize beyond the specific setting, and the public code release aids reproducibility. However, the absence of any quantitative results, implementation details, or error analysis in the abstract limits immediate assessment of whether the experiments actually support the claim.

major comments (2)

[Abstract] Abstract: the central assumption that matching JVPs w.r.t. initial noise during training will restore the teacher's differential response under few-step inference is not justified. Because the student realizes a different composition of the diffusion ODE, first-order agreement at sampled training points need not imply agreement of the effective sensitivity after the reduced trajectory; a derivation or targeted experiment showing why local linearization transfers is required to support the claim.
[Abstract] Abstract: the statement that 'extensive experiments ... demonstrate that GAD significantly restores sensitivity' supplies no numbers, tables, or figures, so it is impossible to evaluate effect sizes, baselines, or whether the reported gains are attributable to JVP matching rather than other factors.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and indicate planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central assumption that matching JVPs w.r.t. initial noise during training will restore the teacher's differential response under few-step inference is not justified. Because the student realizes a different composition of the diffusion ODE, first-order agreement at sampled training points need not imply agreement of the effective sensitivity after the reduced trajectory; a derivation or targeted experiment showing why local linearization transfers is required to support the claim.

Authors: We agree that the transfer of first-order sensitivity from training-time JVP matching to few-step inference requires explicit justification, as the student follows a different ODE composition. The current manuscript provides empirical evidence across control tasks (Sections 4–5) that sensitivity is restored, but lacks a formal derivation. In the revision we will add a short derivation in the appendix showing that, under the Lipschitz continuity assumptions used in the distillation, pointwise JVP agreement at sampled noise levels implies bounded deviation in the integrated sensitivity along the reduced trajectory. We will also include a targeted ablation measuring JVP alignment before and after the reduced steps. revision: yes
Referee: [Abstract] Abstract: the statement that 'extensive experiments ... demonstrate that GAD significantly restores sensitivity' supplies no numbers, tables, or figures, so it is impossible to evaluate effect sizes, baselines, or whether the reported gains are attributable to JVP matching rather than other factors.

Authors: The abstract is intentionally concise; all quantitative results, tables, and figures appear in Sections 4 and 5. To improve evaluability we will revise the abstract to report concrete effect sizes (e.g., diversity and sensitivity metrics relative to baselines) while remaining within length limits. This change will also clarify that gains are measured against pointwise distillation ablations. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation introduces independent JVP alignment objective

full rationale

The paper traces sensitivity loss to pointwise alignment (abstract) and proposes GAD as a new framework that matches Jacobian-vector products w.r.t. input noise. No quoted equations or steps reduce the claimed prediction or result to fitted inputs, self-definitions, or self-citation chains by construction. The central geometric matching technique is presented as an external addition to standard distillation losses without load-bearing self-references or renaming of known results. The framework remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review relies on abstract only; the method assumes standard neural network differentiability for Jacobian computation but introduces no explicit free parameters or new entities.

axioms (1)

domain assumption Teacher and student models are differentiable with respect to input noise.
Required to compute and match Jacobian-vector products.

pith-pipeline@v0.9.1-grok · 5732 in / 1025 out tokens · 28114 ms · 2026-06-28T15:18:16.703491+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

80 extracted references · 8 canonical work pages · 4 internal anchors

[1]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

2000
[2]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

1980
[3]

M. J. Kearns , title =
[4]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

1983
[5]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

2000
[6]

Suppressed for Anonymity , author=
[7]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

1981
[8]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

1959
[9]

International Conference on Machine Learning , volume =

Glide: Towards photorealistic image generation and editing with text-guided diffusion models , author=. International Conference on Machine Learning , volume =
[10]

European Conference on Computer Vision , pages=

Microsoft coco: Common objects in context , author=. European Conference on Computer Vision , pages=. 2014 , organization=

2014
[11]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Effective real image editing with accelerated iterative diffusion inversion , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[12]

International Conference on Learning Representations , publisher =

Denoising diffusion implicit models , author=. International Conference on Learning Representations , publisher =
[13]

International Conference on Machine Learning , pages=

Improved denoising diffusion probabilistic models , author=. International Conference on Machine Learning , pages=. 2021 , organization=

2021
[14]

Advances in Neural Information Processing Systems , volume=

Denoising diffusion probabilistic models , author=. Advances in Neural Information Processing Systems , volume=
[15]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

The silent assistant: Noisequery as implicit guidance for goal-driven image generation , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[16]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Golden noise for diffusion models: A learning framework , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[17]

Advances in Neural Information Processing Systems , volume=

Reno: Enhancing one-step text-to-image models through reward-based noise optimization , author=. Advances in Neural Information Processing Systems , volume=
[18]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

On distillation of guided diffusion models , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[19]

European Conference on Computer Vision , pages=

Adversarial diffusion distillation , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024
[20]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Boxdiff: Text-to-image synthesis with training-free box-constrained diffusion , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[21]

International Conference on Learning Representations , year=

The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise , author=. International Conference on Learning Representations , year=
[22]

International Conference on Learning Representations , year=

Diversity-Rewarded CFG Distillation , author=. International Conference on Learning Representations , year=
[23]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Taming mode collapse in score distillation for text-to-3d generation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[24]

International Conference on Machine Learning , pages=

Consistency Models , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023
[25]

arXiv preprint arXiv:2503.10637 , year=

Distilling diversity and control in diffusion models , author=. arXiv preprint arXiv:2503.10637 , year=

work page arXiv
[26]

International Conference on Learning Representations , publisher =

Score-based generative modeling through stochastic differential equations , author=. International Conference on Learning Representations , publisher =
[27]

International Conference on Learning Representations , year=

Flow Matching for Generative Modeling , author=. International Conference on Learning Representations , year=
[28]

SDXL-Lightning: Progressive Adversarial Diffusion Distillation

Sdxl-lightning: Progressive adversarial diffusion distillation , author=. arXiv preprint arXiv:2402.13929 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[29]

SIGGRAPH Asia , pages=

Fast high-resolution image synthesis with latent adversarial diffusion distillation , author=. SIGGRAPH Asia , pages=
[30]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

One-step diffusion with distribution matching distillation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[31]

Advances in Neural Information Processing Systems , volume=

Improved distribution matching distillation for fast image synthesis , author=. Advances in Neural Information Processing Systems , volume=
[32]

International Conference on Learning Representations , year=

Improved Techniques for Training Consistency Models , author=. International Conference on Learning Representations , year=
[33]

International Conference on Machine Learning , pages=

Knowledge transfer with jacobian matching , author=. International Conference on Machine Learning , pages=. 2018 , organization=

2018
[34]

International Conference on Learning Representations , year=

What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models , author=. International Conference on Learning Representations , year=
[35]

International Conference on Machine Learning , year=

Boost-and-Skip: A Simple Guidance-Free Diffusion for Minority Generation , author=. International Conference on Machine Learning , year=
[36]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Minority-Focused Text-to-Image Generation via Prompt Optimization , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[37]

International Conference on Learning Representations , year=

CADS: Unleashing the Diversity of Diffusion Models through Condition-Annealed Sampling , author=. International Conference on Learning Representations , year=
[38]

International Conference on Learning Representations , year=

Enhancing compositional text-to-image generation with reliable random seeds , author=. International Conference on Learning Representations , year=
[39]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

Learning Few-Step Diffusion Models by Trajectory Distribution Matching , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =
[40]

Neural computation , volume=

Fast exact multiplication by the Hessian , author=. Neural computation , volume=. 1994 , publisher=

1994
[41]

JMLR , volume=

Automatic differentiation in machine learning: a survey , author=. JMLR , volume=
[42]

International Conference on Machine Learning , pages=

How to train your neural ode: the world of jacobian and kinetic regularization , author=. International Conference on Machine Learning , pages=. 2020 , organization=

2020
[43]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Coco-stuff: Thing and stuff classes in context , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[44]

YOLOv4: Optimal Speed and Accuracy of Object Detection

Yolov4: Optimal speed and accuracy of object detection , author=. arXiv preprint arXiv:2004.10934 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2004
[45]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Image synthesis from layout with locality-aware mask adaption , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[46]

International Conference on Machine Learning , pages=

Learning transferable visual models from natural language supervision , author=. International Conference on Machine Learning , pages=. 2021 , organization=

2021
[47]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[48]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Swiftbrush: One-step text-to-image diffusion model with variational score distillation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[49]

European Conference on Computer Vision , pages=

Swiftbrush v2: Make your one-step diffusion model better than its teacher , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024
[50]

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Latent consistency models: Synthesizing high-resolution images with few-step inference , author=. arXiv preprint arXiv:2310.04378 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[51]

Tian et al

Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Mapping , author=. arXiv preprint arXiv:2402.19159 , year=

work page arXiv
[52]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[53]

Transactions on Machine Learning Research , issn=

The Vendi Score: A Diversity Evaluation Metric for Machine Learning , author=. Transactions on Machine Learning Research , issn=
[54]

International Conference on Learning Representations , year=

PixArt- : Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis , author=. International Conference on Learning Representations , year=
[55]

International Conference on Learning Representations , year=

SANA: Efficient high-resolution text-to-image synthesis with linear diffusion transformers , author=. International Conference on Learning Representations , year=
[56]

arXiv preprint arXiv:2509.25127 , year=

Score Distillation of Flow Matching Models , author=. arXiv preprint arXiv:2509.25127 , year=

work page arXiv
[57]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Flash diffusion: Accelerating any conditional diffusion model for few steps image generation , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[58]

International Conference on Learning Representations , year=

You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs , author=. International Conference on Learning Representations , year=
[59]

Advances in Neural Information Processing Systems , volume=

Photorealistic text-to-image diffusion models with deep language understanding , author=. Advances in Neural Information Processing Systems , volume=
[60]

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis , author=. arXiv preprint arXiv:2306.09341 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[61]

Advances in Neural Information Processing Systems , volume=

Pick-a-pic: An open dataset of user preferences for text-to-image generation , author=. Advances in Neural Information Processing Systems , volume=
[62]

Journal of Machine Learning Research , volume=

Visualizing data using t-SNE , author=. Journal of Machine Learning Research , volume=
[63]

Advances in Neural Information Processing Systems , volume=

Efficientformer: Vision transformers at mobilenet speed , author=. Advances in Neural Information Processing Systems , volume=
[64]

International Conference on Learning Representations , year=

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow , author=. International Conference on Learning Representations , year=
[65]

Advances in Neural Information Processing Systems , volume=

Elucidating the design space of diffusion-based generative models , author=. Advances in Neural Information Processing Systems , volume=
[66]

Communications in Statistics-Simulation and Computation , volume=

A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines , author=. Communications in Statistics-Simulation and Computation , volume=. 1989 , publisher=

1989
[67]

Advances in Neural Information Processing Systems , volume=

Sobolev training for neural networks , author=. Advances in Neural Information Processing Systems , volume=
[68]

International Conference on Learning Representations , year=

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion , author=. International Conference on Learning Representations , year=
[69]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Adversarial distribution matching for diffusion distillation towards efficient image and video synthesis , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[70]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Nitrofusion: High-fidelity single-step diffusion through dynamic adversarial training , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[71]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Supercharged One-step Text-to-Image Diffusion Models with Negative Prompts , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[72]

Advances in Neural Information Processing Systems , volume=

Invertible consistency distillation for text-guided image editing in around 7 steps , author=. Advances in Neural Information Processing Systems , volume=
[73]

International Conference on Machine Learning , pages =

Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency , author=. International Conference on Machine Learning , pages =
[74]

International Conference on Learning Representations , year=

DreamFusion: Text-to-3D using 2D Diffusion , author=. International Conference on Learning Representations , year=
[75]

arXiv preprint arXiv:2505.12674 , year=

Few-step diffusion via score identity distillation , author=. arXiv preprint arXiv:2505.12674 , year=

work page arXiv
[76]

International Conference on Learning Representations , year=

Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation , author=. International Conference on Learning Representations , year=
[77]

IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Relational knowledge distillation , author=. IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[78]

IEEE/CVF International Conference on Computer Vision , pages=

Similarity-preserving knowledge distillation , author=. IEEE/CVF International Conference on Computer Vision , pages=
[79]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Fedseg: Class-heterogeneous federated learning for semantic segmentation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[80]

International Conference on Learning Representations , year=

Towards One-step Causal Video Generation via Adversarial Self-Distillation , author=. International Conference on Learning Representations , year=

[1] [1]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

2000

[2] [2]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

1980

[3] [3]

M. J. Kearns , title =

[4] [4]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

1983

[5] [5]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

2000

[6] [6]

Suppressed for Anonymity , author=

[7] [7]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

1981

[8] [8]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

1959

[9] [9]

International Conference on Machine Learning , volume =

Glide: Towards photorealistic image generation and editing with text-guided diffusion models , author=. International Conference on Machine Learning , volume =

[10] [10]

European Conference on Computer Vision , pages=

Microsoft coco: Common objects in context , author=. European Conference on Computer Vision , pages=. 2014 , organization=

2014

[11] [11]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Effective real image editing with accelerated iterative diffusion inversion , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[12] [12]

International Conference on Learning Representations , publisher =

Denoising diffusion implicit models , author=. International Conference on Learning Representations , publisher =

[13] [13]

International Conference on Machine Learning , pages=

Improved denoising diffusion probabilistic models , author=. International Conference on Machine Learning , pages=. 2021 , organization=

2021

[14] [14]

Advances in Neural Information Processing Systems , volume=

Denoising diffusion probabilistic models , author=. Advances in Neural Information Processing Systems , volume=

[15] [15]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

The silent assistant: Noisequery as implicit guidance for goal-driven image generation , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[16] [16]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Golden noise for diffusion models: A learning framework , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[17] [17]

Advances in Neural Information Processing Systems , volume=

Reno: Enhancing one-step text-to-image models through reward-based noise optimization , author=. Advances in Neural Information Processing Systems , volume=

[18] [18]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

On distillation of guided diffusion models , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[19] [19]

European Conference on Computer Vision , pages=

Adversarial diffusion distillation , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024

[20] [20]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Boxdiff: Text-to-image synthesis with training-free box-constrained diffusion , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[21] [21]

International Conference on Learning Representations , year=

The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise , author=. International Conference on Learning Representations , year=

[22] [22]

International Conference on Learning Representations , year=

Diversity-Rewarded CFG Distillation , author=. International Conference on Learning Representations , year=

[23] [23]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Taming mode collapse in score distillation for text-to-3d generation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[24] [24]

International Conference on Machine Learning , pages=

Consistency Models , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023

[25] [25]

arXiv preprint arXiv:2503.10637 , year=

Distilling diversity and control in diffusion models , author=. arXiv preprint arXiv:2503.10637 , year=

work page arXiv

[26] [26]

International Conference on Learning Representations , publisher =

Score-based generative modeling through stochastic differential equations , author=. International Conference on Learning Representations , publisher =

[27] [27]

International Conference on Learning Representations , year=

Flow Matching for Generative Modeling , author=. International Conference on Learning Representations , year=

[28] [28]

SDXL-Lightning: Progressive Adversarial Diffusion Distillation

Sdxl-lightning: Progressive adversarial diffusion distillation , author=. arXiv preprint arXiv:2402.13929 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[29] [29]

SIGGRAPH Asia , pages=

Fast high-resolution image synthesis with latent adversarial diffusion distillation , author=. SIGGRAPH Asia , pages=

[30] [30]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

One-step diffusion with distribution matching distillation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[31] [31]

Advances in Neural Information Processing Systems , volume=

Improved distribution matching distillation for fast image synthesis , author=. Advances in Neural Information Processing Systems , volume=

[32] [32]

International Conference on Learning Representations , year=

Improved Techniques for Training Consistency Models , author=. International Conference on Learning Representations , year=

[33] [33]

International Conference on Machine Learning , pages=

Knowledge transfer with jacobian matching , author=. International Conference on Machine Learning , pages=. 2018 , organization=

2018

[34] [34]

International Conference on Learning Representations , year=

What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models , author=. International Conference on Learning Representations , year=

[35] [35]

International Conference on Machine Learning , year=

Boost-and-Skip: A Simple Guidance-Free Diffusion for Minority Generation , author=. International Conference on Machine Learning , year=

[36] [36]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Minority-Focused Text-to-Image Generation via Prompt Optimization , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[37] [37]

International Conference on Learning Representations , year=

CADS: Unleashing the Diversity of Diffusion Models through Condition-Annealed Sampling , author=. International Conference on Learning Representations , year=

[38] [38]

International Conference on Learning Representations , year=

Enhancing compositional text-to-image generation with reliable random seeds , author=. International Conference on Learning Representations , year=

[39] [39]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

Learning Few-Step Diffusion Models by Trajectory Distribution Matching , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

[40] [40]

Neural computation , volume=

Fast exact multiplication by the Hessian , author=. Neural computation , volume=. 1994 , publisher=

1994

[41] [41]

JMLR , volume=

Automatic differentiation in machine learning: a survey , author=. JMLR , volume=

[42] [42]

International Conference on Machine Learning , pages=

How to train your neural ode: the world of jacobian and kinetic regularization , author=. International Conference on Machine Learning , pages=. 2020 , organization=

2020

[43] [43]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Coco-stuff: Thing and stuff classes in context , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[44] [44]

YOLOv4: Optimal Speed and Accuracy of Object Detection

Yolov4: Optimal speed and accuracy of object detection , author=. arXiv preprint arXiv:2004.10934 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2004

[45] [45]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Image synthesis from layout with locality-aware mask adaption , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[46] [46]

International Conference on Machine Learning , pages=

Learning transferable visual models from natural language supervision , author=. International Conference on Machine Learning , pages=. 2021 , organization=

2021

[47] [47]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[48] [48]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Swiftbrush: One-step text-to-image diffusion model with variational score distillation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[49] [49]

European Conference on Computer Vision , pages=

Swiftbrush v2: Make your one-step diffusion model better than its teacher , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024

[50] [50]

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Latent consistency models: Synthesizing high-resolution images with few-step inference , author=. arXiv preprint arXiv:2310.04378 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[51] [51]

Tian et al

Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Mapping , author=. arXiv preprint arXiv:2402.19159 , year=

work page arXiv

[52] [52]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[53] [53]

Transactions on Machine Learning Research , issn=

The Vendi Score: A Diversity Evaluation Metric for Machine Learning , author=. Transactions on Machine Learning Research , issn=

[54] [54]

International Conference on Learning Representations , year=

PixArt- : Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis , author=. International Conference on Learning Representations , year=

[55] [55]

International Conference on Learning Representations , year=

SANA: Efficient high-resolution text-to-image synthesis with linear diffusion transformers , author=. International Conference on Learning Representations , year=

[56] [56]

arXiv preprint arXiv:2509.25127 , year=

Score Distillation of Flow Matching Models , author=. arXiv preprint arXiv:2509.25127 , year=

work page arXiv

[57] [57]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Flash diffusion: Accelerating any conditional diffusion model for few steps image generation , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[58] [58]

International Conference on Learning Representations , year=

You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs , author=. International Conference on Learning Representations , year=

[59] [59]

Advances in Neural Information Processing Systems , volume=

Photorealistic text-to-image diffusion models with deep language understanding , author=. Advances in Neural Information Processing Systems , volume=

[60] [60]

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis , author=. arXiv preprint arXiv:2306.09341 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[61] [61]

Advances in Neural Information Processing Systems , volume=

Pick-a-pic: An open dataset of user preferences for text-to-image generation , author=. Advances in Neural Information Processing Systems , volume=

[62] [62]

Journal of Machine Learning Research , volume=

Visualizing data using t-SNE , author=. Journal of Machine Learning Research , volume=

[63] [63]

Advances in Neural Information Processing Systems , volume=

Efficientformer: Vision transformers at mobilenet speed , author=. Advances in Neural Information Processing Systems , volume=

[64] [64]

International Conference on Learning Representations , year=

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow , author=. International Conference on Learning Representations , year=

[65] [65]

Advances in Neural Information Processing Systems , volume=

Elucidating the design space of diffusion-based generative models , author=. Advances in Neural Information Processing Systems , volume=

[66] [66]

Communications in Statistics-Simulation and Computation , volume=

A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines , author=. Communications in Statistics-Simulation and Computation , volume=. 1989 , publisher=

1989

[67] [67]

Advances in Neural Information Processing Systems , volume=

Sobolev training for neural networks , author=. Advances in Neural Information Processing Systems , volume=

[68] [68]

International Conference on Learning Representations , year=

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion , author=. International Conference on Learning Representations , year=

[69] [69]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Adversarial distribution matching for diffusion distillation towards efficient image and video synthesis , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[70] [70]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Nitrofusion: High-fidelity single-step diffusion through dynamic adversarial training , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[71] [71]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Supercharged One-step Text-to-Image Diffusion Models with Negative Prompts , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[72] [72]

Advances in Neural Information Processing Systems , volume=

Invertible consistency distillation for text-guided image editing in around 7 steps , author=. Advances in Neural Information Processing Systems , volume=

[73] [73]

International Conference on Machine Learning , pages =

Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency , author=. International Conference on Machine Learning , pages =

[74] [74]

International Conference on Learning Representations , year=

DreamFusion: Text-to-3D using 2D Diffusion , author=. International Conference on Learning Representations , year=

[75] [75]

arXiv preprint arXiv:2505.12674 , year=

Few-step diffusion via score identity distillation , author=. arXiv preprint arXiv:2505.12674 , year=

work page arXiv

[76] [76]

International Conference on Learning Representations , year=

Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation , author=. International Conference on Learning Representations , year=

[77] [77]

IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Relational knowledge distillation , author=. IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[78] [78]

IEEE/CVF International Conference on Computer Vision , pages=

Similarity-preserving knowledge distillation , author=. IEEE/CVF International Conference on Computer Vision , pages=

[79] [79]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Fedseg: Class-heterogeneous federated learning for semantic segmentation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[80] [80]

International Conference on Learning Representations , year=

Towards One-step Causal Video Generation via Adversarial Self-Distillation , author=. International Conference on Learning Representations , year=