Recognition: 2 theorem links
· Lean TheoremLENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling
Pith reviewed 2026-05-11 01:40 UTC · model grok-4.3
The pith
Restricting noise modulation to low-frequency components lets distilled diffusion models match prior image quality with hundreds of times less computation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LENS is an efficient noise modulation framework that works in a low-dimensional low-frequency subspace. The authors observe that low-frequency noise components largely control global image structure and visual fidelity, supply a theoretical reason to limit modulation to that subspace, and derive a corresponding training objective. They then train a lightweight standalone network to perform the modulation, yielding competitive image quality together with 400-700× fewer FLOPs, 25-75× fewer parameters, and 10-20× lower inference overhead than earlier hypernetwork or test-time optimization baselines.
What carries the argument
A lightweight standalone network that selectively modulates low-frequency components of the noise inside a reduced eigen subspace.
If this is right
- Distilled diffusion models become practical for real-time or on-device image generation.
- The computational cost of amortizing test-time optimization drops enough to support wider deployment.
- Inference latency falls by an order of magnitude while quality stays competitive.
- Model storage and memory requirements shrink substantially compared with full-dimensional hypernetworks.
- The same efficiency pattern can be applied to other distilled generative pipelines that currently rely on high-dimensional noise shaping.
Where Pith is reading between the lines
- The same low-frequency restriction might transfer to video or audio generation where global structure is also carried by lower frequencies.
- Combining LENS with quantization or pruning could produce even larger efficiency gains on edge hardware.
- Varying the cutoff frequency or subspace dimension offers a tunable knob for trading quality against speed that future work could optimize.
- If the low-frequency principle holds across architectures, it could reduce reliance on ever-larger hypernetworks in the broader generative-modeling literature.
Load-bearing premise
Low-frequency components of the noise largely determine the global structure and visual fidelity of generated images.
What would settle it
Measure FID and human preference scores on standard benchmarks when high-frequency noise modulation is added on top of LENS; if quality improves by more than a small margin, the claim that low-frequency modulation alone suffices is weakened.
Figures
read the original abstract
Distilled diffusion models accelerate image generation by reducing the number of denoising steps, but often suffer from degraded image quality. To mitigate this trade-off, test-time optimization methods improve quality, yet their iterative nature incurs substantial computational overhead and leads to slow inference, limiting practical usability. Recent hypernetwork-based approaches amortize this process during training, but still require costly noise modulation in high-dimensional latent spaces. In this work, we propose LENS (Low-frequency Eigen Noise Shaping), an efficient noise modulation framework that operates in a low-dimensional subspace. Our approach is motivated by the observation that low-frequency components of the noise largely determine the global structure and visual fidelity of generated images. Based on this observation, we provide a theoretical justification for restricting modulation to the low-frequency subspace and derive a principled training objective. Building on this, LENS employs a lightweight, standalone network to selectively modulate these components, enabling efficient and targeted noise modulation. Extensive experiments demonstrate that LENS achieves competitive image quality while reducing FLOPs by 400-700$\times$, model parameters by 25-75$\times$, and inference-time overhead by 10-20$\times$ compared to prior methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes LENS (Low-Frequency Eigen Noise Shaping), a framework for efficient noise modulation in distilled diffusion models. It restricts modulation to a low-dimensional low-frequency eigen-subspace based on the observation that low-frequency noise components largely determine global structure and visual fidelity. The authors provide a theoretical justification for this restriction, derive a principled training objective, and employ a lightweight standalone network for selective modulation. Experiments claim that LENS achieves competitive image quality while reducing FLOPs by 400-700×, model parameters by 25-75×, and inference-time overhead by 10-20× relative to prior hypernetwork and test-time optimization baselines.
Significance. If the low-frequency subspace restriction and derived objective are rigorously justified without degrading perceptual quality, LENS would represent a substantial advance in amortizing test-time optimization costs for diffusion sampling. The reported efficiency gains could enable practical high-quality generation on edge devices, addressing a key bottleneck in current distilled models.
major comments (3)
- [Abstract] Abstract: The theoretical justification for restricting modulation to the low-frequency eigen-subspace is asserted but not derived in detail; it must explicitly address whether the non-linearity of the U-Net denoiser allows high-frequency noise to be safely omitted without affecting the score function or fine textures, as this directly underpins the claimed 400-700× FLOPs reduction.
- [Experiments] Experiments section (implied by efficiency claims): The 400-700× FLOPs, 25-75× parameter, and 10-20× overhead reductions require explicit tables comparing against full-space hypernetwork baselines with error bars, dataset-specific frequency analysis, and ablation on subspace dimensionality to confirm that global metrics do not mask local fidelity losses.
- [Method / Training Objective] Training objective derivation: The principled training objective derived from the low-frequency observation needs to be shown to be independent of fitted parameters in the subspace choice; otherwise the efficiency claims risk circularity with the external frequency-content observation.
minor comments (2)
- [Method] Clarify notation for the eigen-subspace construction and how the lightweight modulator is architecturally separated from the main denoiser.
- [Introduction] Add missing references to prior frequency-domain analyses in diffusion models to better situate the low-frequency observation.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important areas for clarification and expansion, particularly regarding the depth of the theoretical derivation, the rigor of experimental reporting, and the independence of the training objective. We will revise the manuscript to address these points directly while preserving the core contributions of LENS.
read point-by-point responses
-
Referee: [Abstract] Abstract: The theoretical justification for restricting modulation to the low-frequency eigen-subspace is asserted but not derived in detail; it must explicitly address whether the non-linearity of the U-Net denoiser allows high-frequency noise to be safely omitted without affecting the score function or fine textures, as this directly underpins the claimed 400-700× FLOPs reduction.
Authors: We agree that the current presentation of the theoretical justification is high-level and would benefit from greater detail. The manuscript motivates the restriction via the observation that low-frequency noise components dominate global structure, but does not fully derive the implications for the nonlinear U-Net. In the revision we will expand the relevant section with an explicit analysis of the denoiser's frequency response, showing that high-frequency perturbations have limited impact on the score function in the distilled setting and do not materially affect fine textures. This expanded derivation will directly support the efficiency claims. revision: yes
-
Referee: [Experiments] Experiments section (implied by efficiency claims): The 400-700× FLOPs, 25-75× parameter, and 10-20× overhead reductions require explicit tables comparing against full-space hypernetwork baselines with error bars, dataset-specific frequency analysis, and ablation on subspace dimensionality to confirm that global metrics do not mask local fidelity losses.
Authors: We acknowledge that the efficiency numbers are currently summarized without the requested supporting tables and analyses. The revised manuscript will include (i) explicit comparison tables against full-space hypernetwork baselines with standard-error bars from repeated runs, (ii) dataset-specific frequency-content breakdowns (CIFAR-10 and ImageNet), and (iii) ablations over subspace dimensionality. These additions will demonstrate that the reported gains hold while local fidelity, measured by both FID and perceptual patch metrics, remains competitive. revision: yes
-
Referee: [Method / Training Objective] Training objective derivation: The principled training objective derived from the low-frequency observation needs to be shown to be independent of fitted parameters in the subspace choice; otherwise the efficiency claims risk circularity with the external frequency-content observation.
Authors: The subspace is fixed a priori via eigen-decomposition of the noise covariance and is not altered by the parameters of the modulation network. The training objective is derived solely from this fixed low-frequency restriction and does not depend on the fitted weights. To eliminate any appearance of circularity we will add a dedicated paragraph and short proof sketch in the Method section clarifying this independence. revision: yes
Circularity Check
No significant circularity; derivation grounded in external observation and independent theoretical steps.
full rationale
The paper's chain begins with an external observation on low-frequency noise components (not derived from its own fitted parameters or equations), followed by a claimed theoretical justification for subspace restriction and derivation of a training objective. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided text. The lightweight modulator and efficiency claims rest on this independent foundation rather than reducing to the inputs by construction. This is the most common honest outcome for papers with external motivation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Low-frequency components of the noise largely determine the global structure and visual fidelity of generated images
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
L(ϕ) := E_{w∼q} [½ ∥hϕ(wL)∥² − r(g(wL + hϕ(wL), wH)) ] (Eq. 8); Assumption 1: |r(g(wL,wH)) − r̄(wL)| ≤ ε; Prop. 3.1: DKL(q⋆ ∥ q̃⋆) ≤ 2ε
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Patch-wise PCA projection onto V, low-frequency coefficients wL ∈ R^{N×k} with k ≪ d; reward gradient energy concentrated in top-k PCA components
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A noise is worth diffusion guidance
Donghoon Ahn, Jiwon Kang, Sanghyun Lee, Jaewon Min, Minjae Kim, Wooseok Jang, Hy- oungwon Cho, Sayak Paul, SeonHwa Kim, Eunju Cha, Kyong Hwan Jin, and Seungryong Kim. A noise is worth diffusion guidance. InThe Fourteenth International Conference on Learning Representations, 2026. URLhttps://openreview.net/forum?id=xEWooSOgaz
work page 2026
-
[2]
arXiv preprint arXiv:2406.01970 (2024)
Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Boqing Gong, Cho-Jui Hsieh, and Minhao Cheng. The crystal ball hypothesis in diffusion models: Anticipating object positions from initial noise. arXiv preprint arXiv:2406.01970, 2024
-
[3]
D-Flow: Differen- tiating through Flows for Controlled Generation, July 2024
Heli Ben-Hamu, Omri Puny, Itai Gat, Brian Karrer, Uriel Singer, and Yaron Lipman. D-flow: Differentiating through flows for controlled generation.arXiv preprint arXiv:2402.14017, 2024
-
[4]
Sana-sprint: One-step diffusion with continuous-time consistency distillation
Junsong Chen, Shuchen Xue, Yuyang Zhao, Jincheng Yu, Sayak Paul, Junyu Chen, Han Cai, Song Han, and Enze Xie. Sana-sprint: One-step diffusion with continuous-time consistency distillation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 16185–16195, 2025
work page 2025
-
[5]
Scaling rectified flow trans- formers for high-resolution image synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow trans- formers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024
work page 2024
-
[6]
Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, and Zeynep Akata. Reno: Enhancing one-step text-to-image models through reward-based noise optimization.Advances in Neural Information Processing Systems, 37:125487–125519, 2024
work page 2024
-
[7]
Noise hypernetworks: Amortizing test-time compute in diffusion models
Luca Eyring, Shyamgopal Karthik, Alexey Dosovitskiy, Nataniel Ruiz, and Zeynep Akata. Noise hypernetworks: Amortizing test-time compute in diffusion models. InThe Thirty- ninth Annual Conference on Neural Information Processing Systems, 2025. URL https: //openreview.net/forum?id=DbzREoPwmM
work page 2025
-
[8]
Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, and William Yang Wang. Training-free structured diffusion guidance for compositional text-to-image synthesis.arXiv preprint arXiv:2212.05032, 2022
-
[9]
Initno: Boosting text-to-image diffusion models via initial noise optimization
Xiefan Guo, Jinlin Liu, Miaomiao Cui, Jiankai Li, Hongyu Yang, and Di Huang. Initno: Boosting text-to-image diffusion models via initial noise optimization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9380–9389, 2024
work page 2024
-
[10]
Clipscore: A reference-free evaluation metric for image captioning
Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. Clipscore: A reference-free evaluation metric for image captioning. InProceedings of the 2021 conference on empirical methods in natural language processing, pages 7514–7528, 2021
work page 2021
-
[11]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020
work page 2020
-
[12]
Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022
work page 2022
-
[13]
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Xiwei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, and Gang Yu. Ella: Equip diffusion models with llm for enhanced semantic alignment.arXiv preprint arXiv:2403.05135, 2024
work page internal anchor Pith review arXiv 2024
-
[14]
Kaiyi Huang, Kaiyue Sun, Enze Xie, Zhenguo Li, and Xihui Liu. T2i-compbench: A compre- hensive benchmark for open-world compositional text-to-image generation.Advances in Neural Information Processing Systems, 36:78723–78747, 2023
work page 2023
-
[15]
GenEval 2: Addressing benchmark drift in text-to-image evaluation.arXiv preprint arXiv:2512.16853,
Amita Kamath, Kai-Wei Chang, Ranjay Krishna, Luke Zettlemoyer, Yushi Hu, and Marjan Ghazvininejad. Geneval 2: Addressing benchmark drift in text-to-image evaluation.arXiv preprint arXiv:2512.16853, 2025. 10
-
[16]
Optimizing diffusion noise can serve as universal motion priors
Korrawe Karunratanakul, Konpat Preechakul, Emre Aksan, Thabo Beeler, Supasorn Suwa- janakorn, and Siyu Tang. Optimizing diffusion noise can serve as universal motion priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1334–1345, 2024
work page 2024
-
[17]
Auto-Encoding Variational Bayes
Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[18]
Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652–36663, 2023
work page 2023
-
[19]
Enhancing compositional text-to- image generation with reliable random seeds
Shuangqi Li, Hieu Le, Jingyi Xu, and Mathieu Salzmann. Enhancing compositional text-to- image generation with reliable random seeds. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=5BSlakturs
work page 2025
-
[20]
Zeming Li, Xiangyue Liu, Xiangyu Zhang, Ping Tan, and Heung-Yeung Shum. Noisear: Autoregressing initial noise prior for diffusion models.arXiv preprint arXiv:2506.01337, 2025
-
[21]
Yongzhe Lyu, Yu Wu, Yutian Lin, and Bo Du. Is-diff: Improving diffusion-based inpainting with better initial seed.arXiv preprint arXiv:2509.11638, 2025
-
[22]
The lottery ticket hypothesis in denoising: Towards semantic-driven initialization
Jiafeng Mao, Xueting Wang, and Kiyoharu Aizawa. The lottery ticket hypothesis in denoising: Towards semantic-driven initialization. InEuropean Conference on Computer Vision, pages 93–109. Springer, 2024
work page 2024
-
[23]
Noise diffusion for enhancing semantic faithfulness in text-to-image synthesis
Boming Miao, Chunxiao Li, Xiaoxiao Wang, Andi Zhang, Rui Sun, Zizhe Wang, and Yao Zhu. Noise diffusion for enhancing semantic faithfulness in text-to-image synthesis. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 23575–23584, 2025
work page 2025
-
[24]
Ditto: Diffusion inference-time t-optimization for music generation.arXiv preprint arXiv:2401.12179,
Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick, and Nicholas J Bryan. Ditto: Diffusion inference-time t-optimization for music generation.arXiv preprint arXiv:2401.12179, 2024
-
[25]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[26]
Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classifiers generalize to imagenet? InInternational conference on machine learning, pages 5389–5400. PMLR, 2019
work page 2019
-
[27]
Yuxi Ren, Xin Xia, Yanzuo Lu, Jiacheng Zhang, Jie Wu, Pan Xie, Xing Wang, and Xuefeng Xiao. Hyper-sd: Trajectory segmented consistency model for efficient image synthesis.Advances in neural information processing systems, 37:117340–117362, 2024
work page 2024
-
[28]
High- resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022
work page 2022
-
[29]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge.International journal of computer vision, 115(3):211–252, 2015
work page 2015
-
[30]
Progressive Distillation for Fast Sampling of Diffusion Models
Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022
work page internal anchor Pith review arXiv 2022
-
[31]
Adversarial diffusion distillation
Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation. InEuropean Conference on Computer Vision, pages 87–103. Springer, 2024
work page 2024
-
[32]
Stretching each dollar: Diffusion training from scratch on a micro-budget
Vikash Sehwag, Xianghao Kong, Jingtao Li, Michael Spranger, and Lingjuan Lyu. Stretching each dollar: Diffusion training from scratch on a micro-budget. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 28596–28608, 2025. 11
work page 2025
-
[33]
Denoising Diffusion Implicit Models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[34]
Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. 2023
work page 2023
-
[35]
Tuning-free alignment of diffusion models with direct noise optimization
Zhiwei Tang, Jiangweizhi Peng, Jiasheng Tang, Mingyi Hong, Fan Wang, and Tsung-Hui Chang. Tuning-free alignment of diffusion models with direct noise optimization. InICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling, 2024. URL https://openreview.net/forum?id=Dqpa8rbL39
work page 2024
-
[36]
arXiv preprint arXiv:2502.14944 , year =
Masatoshi Uehara, Xingyu Su, Yulai Zhao, Xiner Li, Aviv Regev, Shuiwang Ji, Sergey Levine, and Tommaso Biancalani. Reward-guided iterative refinement in diffusion models at test-time with applications to protein and dna design.arXiv preprint arXiv:2502.14944, 2025
-
[37]
Masatoshi Uehara, Yulai Zhao, Chenyu Wang, Xiner Li, Aviv Regev, Sergey Levine, and Tom- maso Biancalani. Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review.arXiv preprint arXiv:2501.09685, 2025
-
[38]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017
work page 2017
-
[39]
End-to-end diffusion latent optimization improves classifier guidance
Bram Wallace, Akash Gokul, Stefano Ermon, and Nikhil Naik. End-to-end diffusion latent optimization improves classifier guidance. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7280–7290, 2023
work page 2023
-
[40]
Seeds of structure: Patch PCA reveals universal compositional cues in diffusion models
Qingsong Wang, Zhengchao Wan, Mikhail Belkin, and Yusu Wang. Seeds of structure: Patch PCA reveals universal compositional cues in diffusion models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview. net/forum?id=EgH5WYB6my
work page 2025
-
[41]
Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, and Hongsheng Li. Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis.arXiv preprint arXiv:2306.09341, 2023
work page internal anchor Pith review arXiv 2023
-
[42]
Qingsong Xie, Zhenyi Liao, Zhijie Deng, Haonan Lu, et al. Tlcm: Training-efficient latent consistency model for image generation with 2-8 steps.arXiv preprint arXiv:2406.05768, 2024
-
[43]
Sirui Xie, Zhisheng Xiao, Diederik P Kingma, Tingbo Hou, Ying N Wu, Kevin Murphy, Tim Salimans, Ben Poole, and Ruiqi Gao. Em distillation for one-step diffusion models.Advances in Neural Information Processing Systems, 37:45073–45104, 2024
work page 2024
-
[44]
Imagereward: Learning and evaluating human preferences for text-to-image generation
Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: Learning and evaluating human preferences for text-to-image generation. Advances in Neural Information Processing Systems, 36:15903–15935, 2023
work page 2023
-
[45]
Good seed makes a good crop: Discovering secret seeds in text-to-image diffusion models
Katherine Xu, Lingzhi Zhang, and Jianbo Shi. Good seed makes a good crop: Discovering secret seeds in text-to-image diffusion models. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3024–3034. IEEE, 2025
work page 2025
-
[46]
One-step diffusion with distribution matching distillation
Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6613–6623, 2024
work page 2024
-
[47]
Text-to-image diffusion models in generative ai: A survey.arXiv preprint arXiv:2303.07909, 2023
Chenshuang Zhang, Chaoning Zhang, Mengchun Zhang, In So Kweon, and Junmo Kim. Text- to-image diffusion models in generative ai: A survey.arXiv preprint arXiv:2303.07909, 2023
-
[48]
The unrea- sonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018
work page 2018
-
[49]
Golden noise for diffusion models: A learning framework
Zikai Zhou, Shitong Shao, Lichen Bai, Shufei Zhang, Zhiqiang Xu, Bo Han, and Zeke Xie. Golden noise for diffusion models: A learning framework. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 17688–17697, 2025. 12 A Theoretical Derivations A.1 Preliminaries Gaussian Distribution.We denote N(µ,Σ) as a multivariate Gaussian ...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.