Recognition: no theorem link
The two clocks and the innovation window: When and how generative models learn rules
Pith reviewed 2026-05-12 03:05 UTC · model grok-4.3
The pith
Generative models learn rules during an innovation window between first valid outputs and memorization of training data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We define the innovation window as the interval [τ_rule, τ_mem]. This window widens with increasing N and narrows with rule complexity, and may vanish entirely when τ_rule ≥ τ_mem. The same two-clock structure arises in both diffusion (DiT) and autoregressive (GPT) models, with architecture-dependent offsets. Dissecting the learned score of DiT models reveals a corresponding evolution of the optimization landscapes, where rule-valid samples' basins expand substantially around τ_rule, while training samples' basins begin to dominate around τ_mem.
What carries the argument
The innovation window [τ_rule, τ_mem], where τ_rule is the training step of first rule-valid generations and τ_mem is the step when the model begins reproducing training samples.
If this is right
- The innovation window widens with larger dataset size N.
- The window narrows or vanishes entirely as rule complexity increases.
- The two-clock structure and window appear in both diffusion and autoregressive architectures, with architecture-specific timing offsets.
- Rule-valid sample basins in the score function expand around τ_rule while training-sample basins dominate around τ_mem.
Where Pith is reading between the lines
- Training schedules or early stopping around the innovation window could promote rule learning over memorization.
- The framework may extend to predict generalization behavior on tasks beyond the binary rules and puzzles studied.
- Varying model capacity could shift τ_rule earlier relative to τ_mem, enlarging the window on fixed data.
Load-bearing premise
The chosen definitions of τ_rule and τ_mem separate genuine rule learning from partial memorization or other artifacts on the synthetic tasks.
What would settle it
Observe whether τ_rule exceeds τ_mem on a high-complexity rule with small N, resulting in no rule-valid generations before memorization begins.
Figures
read the original abstract
Generative models trained on finite data face a fundamental tension: their score-matching or next-token objective converges to the empirical training distribution rather than the population distribution we seek to learn. Using rule-valid synthetic tasks, we trace this tension across two training timescales: $\tau_{\mathrm{rule}}$, the step at which generations first become rule-valid, and $\tau_{\mathrm{mem}}$, the step at which models begin reproducing training samples. Focusing on parity and extending to other binary rules and combinatorial puzzles, we characterize how these two clocks, $\tau_{\mathrm{rule}}$ and $\tau_{\mathrm{mem}}$, depend on key aspects of the learning setup. Specifically, we show that $\tau_{\mathrm{rule}}$ increases with rule complexity and decreases with model capacity, while $\tau_{\mathrm{mem}}$ is approximately invariant to the rule and scales nearly linearly with dataset size $N$. We define the \emph{innovation window} as the interval $[\tau_{\mathrm{rule}}, \tau_{\mathrm{mem}}]$. This window widens with increasing $N$ and narrows with rule complexity, and may vanish entirely when $\tau_{\mathrm{rule}} \geq \tau_{\mathrm{mem}}$. The same two-clock structure arises in both diffusion (DiT) and autoregressive (GPT) models, with architecture-dependent offsets. Dissecting the learned score of DiT models reveals a corresponding evolution of the optimization landscapes, where rule-valid samples' basins expand substantially around $\tau_{\mathrm{rule}}$, while training samples' basins begin to dominate around $\tau_{\mathrm{mem}}$. Together, these results yield a unified and predictive account of when and how generative models exhibit genuine innovation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that generative models on synthetic rule-based tasks exhibit two distinct training timescales: τ_rule (the step at which generated samples first satisfy the underlying rule, e.g., parity) and τ_mem (the step at which models begin reproducing exact training samples). It defines the 'innovation window' as the interval [τ_rule, τ_mem], shows that this window widens with dataset size N and narrows with rule complexity (potentially vanishing when τ_rule ≥ τ_mem), demonstrates the same structure in both DiT diffusion and GPT autoregressive models with architecture-dependent offsets, and supports the claim via analysis of how rule-valid sample basins expand in the DiT score landscape around τ_rule while training-sample basins dominate around τ_mem.
Significance. If the separation between the clocks holds under rigorous controls, the work offers a unified, predictive account of the transition from rule acquisition to memorization in generative models, with potential implications for training regimes that extend the innovation window. Strengths include the use of controlled synthetic tasks, cross-architecture consistency, and the landscape dissection in DiT; however, the absence of reported statistical details (runs, error bars, scoring protocols) currently limits the strength of the empirical claims.
major comments (2)
- [Abstract and experimental sections] The central claim that τ_rule marks genuine rule learning (distinct from partial memorization or sampling artifacts) rests on the operational definitions of τ_rule and τ_mem, but the abstract and experimental description provide no details on statistical controls, number of runs, error bars, or explicit ablations (e.g., diversity of rule-consistent but non-training samples or controls for local n-gram statistics on parity tasks). This makes it impossible to verify that the reported dependencies on N and complexity are not confounded by chance or partial pattern matching, directly undermining the interpretation of the innovation window.
- [Abstract] The claim that τ_mem is 'approximately invariant to the rule' and scales 'nearly linearly with N' while τ_rule depends on complexity is load-bearing for the window's predictive power, yet no equations or fitting procedures are shown that reduce these to parameter-free quantities; without such reduction or explicit controls for how rule-validity is scored, the reported architecture-dependent offsets remain descriptive rather than explanatory.
minor comments (2)
- Clarify the precise criterion used to declare a generation 'rule-valid' (e.g., exact satisfaction of parity or combinatorial constraints) and how τ_rule is detected in practice (first occurrence, majority vote over samples, etc.).
- The manuscript would benefit from a table or figure summarizing the measured τ values across rules, N, and capacities, including variability across runs.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of the work's potential and for the detailed comments. We address each major comment below, agreeing to enhance the statistical reporting and quantitative analysis in the revised manuscript.
read point-by-point responses
-
Referee: [Abstract and experimental sections] The central claim that τ_rule marks genuine rule learning (distinct from partial memorization or sampling artifacts) rests on the operational definitions of τ_rule and τ_mem, but the abstract and experimental description provide no details on statistical controls, number of runs, error bars, or explicit ablations (e.g., diversity of rule-consistent but non-training samples or controls for local n-gram statistics on parity tasks). This makes it impossible to verify that the reported dependencies on N and complexity are not confounded by chance or partial pattern matching, directly undermining the interpretation of the innovation window.
Authors: We acknowledge the referee's concern regarding the lack of reported statistical details. In the revised version, we will expand the experimental sections to include the number of independent runs performed, error bars, and explicit descriptions of the scoring protocols for rule validity. We will also incorporate ablations on the diversity of generated rule-consistent samples not in the training data and controls for local statistics on parity tasks to rule out partial pattern matching. These changes will strengthen the evidence for genuine rule learning at τ_rule. revision: yes
-
Referee: [Abstract] The claim that τ_mem is 'approximately invariant to the rule' and scales 'nearly linearly with N' while τ_rule depends on complexity is load-bearing for the window's predictive power, yet no equations or fitting procedures are shown that reduce these to parameter-free quantities; without such reduction or explicit controls for how rule-validity is scored, the reported architecture-dependent offsets remain descriptive rather than explanatory.
Authors: We recognize that the scaling claims would be more robust with explicit quantitative fits. The observations of τ_mem's near-invariance to the rule and linear scaling with N, as well as τ_rule's dependence on complexity, are drawn from systematic experiments across various configurations. In the revision, we will add equations and fitting procedures (e.g., linear regression models for τ_mem as a function of N) with goodness-of-fit metrics, and provide the precise definition of the rule-validity scoring function used. This will better explain the architecture-dependent offsets by grounding them in the optimization dynamics. revision: yes
Circularity Check
No significant circularity: operational definitions of the clocks are independent measurements yielding empirical dependencies
full rationale
The paper defines τ_rule as the training step when generations first satisfy the rule (measured by direct validity checks on samples) and τ_mem as the step when training samples begin to be reproduced (measured by exact matching). The innovation window is then defined simply as the interval between these two observed quantities. All reported scalings—τ_rule increasing with rule complexity and decreasing with capacity, τ_mem linear in N and rule-invariant—are presented as results of controlled experiments that vary N, complexity, and architecture while recording the measured clocks. No equations reduce these quantities to fitted parameters or presuppose the window structure; the landscape analysis in DiT models is likewise an observational dissection of score evolution around the measured times. No self-citations or imported uniqueness theorems are invoked as load-bearing premises. The derivation chain is therefore self-contained empirical observation rather than tautological reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Synthetic rule-valid tasks (parity and combinatorial puzzles) are representative of the rule-learning dynamics that occur when generative models are trained on real data.
Reference graph
Works this paper leans on
-
[1]
Advances in neural information processing systems , volume=
Understanding generalizability of diffusion models requires rethinking the hidden gaussian structure , author=. Advances in neural information processing systems , volume=
-
[2]
Advances in Neural Information Processing Systems , volume=
Locality in image diffusion models emerges from data statistics , author=. Advances in Neural Information Processing Systems , volume=
-
[3]
arXiv preprint arXiv:2602.02908 , year=
A Random Matrix Theory Perspective on the Consistency of Diffusion Models , author=. arXiv preprint arXiv:2602.02908 , year=
-
[4]
Vision Transformers Need Registers
Vision transformers need registers , author=. arXiv preprint arXiv:2309.16588 , year=
work page internal anchor Pith review arXiv
-
[5]
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Representation alignment for generation: Training diffusion transformers is easier than you think , author=. arXiv preprint arXiv:2410.06940 , year=
work page internal anchor Pith review arXiv
-
[6]
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Grokking: Generalization beyond overfitting on small algorithmic datasets , author=. arXiv preprint arXiv:2201.02177 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
Kearns, Michael , title =. 1998 , issue_date =. doi:10.1145/293347.293351 , journal =
-
[8]
The Thirteenth International Conference on Learning Representations,
Juno Kim and Taiji Suzuki , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =
work page 2025
-
[9]
International Conference on Learning Representations , year =
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency , author =. International Conference on Learning Representations , year =. doi:10.48550/arXiv.2410.05459 , bibSource =
-
[10]
Hidden progress in deep learning: Sgd learns parities near the computational limit
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit , author =. Neural Information Processing Systems , year =. doi:10.48550/arXiv.2207.08799 , bibSource =
-
[11]
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck , author =. 2023 , journal =
work page 2023
-
[12]
Transactions of the Association for Computational Linguistics , volume=
Theoretical limitations of self-attention in neural sequence models , author=. Transactions of the Association for Computational Linguistics , volume=. 2020 , publisher=
work page 2020
-
[13]
Why are Sensitive Functions Hard for Transformers? , author =. 2024 , journal =
work page 2024
-
[14]
Transformers learn shortcuts to automata
Transformers learn shortcuts to automata , author=. arXiv preprint arXiv:2210.10749 , year=
-
[15]
Overcoming a Theoretical Limitation of Self-Attention , author =. 2022 , journal =
work page 2022
-
[16]
arXiv preprint arXiv:2105.11115 , year=
Self-attention networks can process bounded hierarchical languages , author=. arXiv preprint arXiv:2105.11115 , year=
-
[17]
The Twelfth International Conference on Learning Representations,
Margalit Glasgow , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =
work page 2024
-
[18]
Hardness of Learning Fixed Parities with Neural Networks , author =. 2025 , journal =
work page 2025
-
[19]
Neural Information Processing Systems , year =
How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad , author =. Neural Information Processing Systems , year =
-
[20]
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models , author =. AAAI Conference on Artificial Intelligence , year =. doi:10.48550/arXiv.2302.08453 , bibSource =
-
[21]
Tight pair query lower bounds for matching and earth mover’s distance
2025 IEEE 66th Annual Symposium on Foundations of Computer Science (FOCS) , pages =. 2025 , author =. doi:10.1109/FOCS63196.2025.00136 , title =
-
[22]
Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks , author =. 2025 , journal =
work page 2025
-
[23]
Simplicity Bias of Transformers to Learn Low Sensitivity Functions , author =. 2024 , journal =
work page 2024
-
[24]
Annual Conference Computational Learning Theory , year =
SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics , author =. Annual Conference Computational Learning Theory , year =. doi:10.48550/arXiv.2302.11055 , bibSource =
-
[25]
The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models , author =. 2025 , journal =
work page 2025
-
[26]
Annual Meeting of the Association for Computational Linguistics , year =
Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions , author =. Annual Meeting of the Association for Computational Linguistics , year =. doi:10.48550/arXiv.2211.12316 , bibSource =
-
[27]
In search of dispersed memories: Generative diffusion models are associative memory networks , author=. 2023 , eprint=
work page 2023
-
[28]
Journal of Machine Learning Research , volume =
Generalization on the unseen, logic reasoning and degree curriculum , author =. Journal of Machine Learning Research , volume =
-
[29]
Learning High-Degree Parities: The Crucial Role of the Initialization , booktitle =
Emmanuel Abbe and Elisabetta Cornacchia and Jan Hazla and Donald Kougang. Learning High-Degree Parities: The Crucial Role of the Initialization , booktitle =. 2025 , url =
work page 2025
-
[30]
Stochastic Interpolants: A Unifying Framework for Flows and Diffusions
Albergo, Michael S. and Boffi, Nicholas M. and. Stochastic. 2023 , month = nov, number =. doi:10.48550/arXiv.2303.08797 , urldate =. arXiv , keywords =:2303.08797 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08797 2023
-
[31]
A Good Score Does not Lead to A Good Generative Model , author=. 2024 , eprint=
work page 2024
-
[32]
arXiv preprint arXiv:2411.19339 , year=
Towards a mechanistic explanation of diffusion model generalization , author=. arXiv preprint arXiv:2411.19339 , year=
-
[33]
Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training , author=. 2025 , eprint=
work page 2025
-
[34]
Blattmann, Andreas and Rombach, Robin and Ling, Huan and Dockhorn, Tim and Kim, Seung Wook and Fidler, Sanja and Kreis, Karsten , year =. Align Your Latents:. Proceedings of the
- [35]
-
[36]
arXiv preprint arXiv:2602.17846 , year=
Two Calm Ends and the Wild Middle: A Geometric Picture of Memorization in Diffusion Models , author=. arXiv preprint arXiv:2602.17846 , year=
-
[37]
PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Chen, Junsong and Yu, Jincheng and Ge, Chongjian and Yao, Lewei and Xie, Enze and Wu, Yue and Wang, Zhongdao and Kwok, James and Luo, Ping and Lu, Huchuan and Li, Zhenguo , year =. doi:10.48550/arXiv.2310.00426 , urldate =. arXiv , keywords =:2310.00426 , primaryclass =
work page internal anchor Pith review doi:10.48550/arxiv.2310.00426
-
[38]
Chen, Sitan and Chewi, Sinho and Li, Jerry and Li, Yuanzhi and Salim, Adil and Zhang, Anru , year =. Sampling Is as Easy as Learning the Score: Theory for Diffusion Models with Minimal Data Assumptions , booktitle =
-
[39]
Deconstructing denoising diffusion models for self-supervised learning
Deconstructing Denoising Diffusion Models for Self-Supervised Learning , author =. 2024 , journal =. 2401.14404 , archiveprefix =
-
[40]
Choi, Yunjey and Uh, Youngjung and Yoo, Jaejun and Ha, Jung-Woo , year =. Stargan v2:. Proceedings of the
-
[41]
Conwell, Colin and Ullman, Tomer , year =. Testing. doi:10.48550/arXiv.2208.00005 , urldate =. arXiv , keywords =:2208.00005 , primaryclass =
-
[42]
Origins of Creativity in Attention-Based Diffusion Models , author=. 2025 , eprint=
work page 2025
-
[43]
arXiv preprint arXiv:2404.18869 , year=
Learning Mixtures of Gaussians Using Diffusion Models , author =. 2024 , journal =. 2404.18869 , archiveprefix =
-
[44]
Denoising Diffusion Probabilistic Models , author =. 2020 , journal =
work page 2020
-
[45]
Classifier-Free Diffusion Guidance
Classifier-Free Diffusion Guidance , author =. 2022 , journal =. 2207.12598 , archiveprefix =
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[46]
Video Diffusion Models , booktitle =
Ho, Jonathan and Salimans, Tim and Gritsenko, Alexey and Chan, William and Norouzi, Mohammad and Fleet, David J , editor =. Video Diffusion Models , booktitle =. 2022 , volume =
work page 2022
-
[47]
Estimation of Non-Normalized Statistical Models by Score Matching , author =. 2005 , month = dec, journal =
work page 2005
-
[48]
Karras, Tero and Aittala, Miika and Lehtinen, Jaakko and Hellsten, Janne and Aila, Timo and Laine, Samuli , year =. Analyzing and. doi:10.48550/arXiv.2312.02696 , urldate =. arXiv , keywords =:2312.02696 , primaryclass =
-
[49]
An analytic theory of creativity in convolutional diffusion models , author=. 2024 , eprint=
work page 2024
-
[50]
International Conference on Learning Representations , author =
-
[51]
arXiv preprint arXiv:2106.06406 , eprint =
Lee, Sang-gil and Kim, Heeseung and Shin, Chaehun and Tan, Xu and Liu, Chang and Meng, Qi and Qin, Tao and Chen, Wei and Yoon, Sungroh and Liu, Tie-Yan , year =. arXiv preprint arXiv:2106.06406 , eprint =
-
[52]
Pseudo numerical methods for diffusion models on manifolds
Pseudo Numerical Methods for Diffusion Models on Manifolds , author =. 2022 , journal =. 2202.09778 , archiveprefix =
-
[53]
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu
Lu, Cheng and Zhou, Yuhao and Bao, Fan and Chen, Jianfei and Li, Chongxuan and Zhu, Jun , year =. Dpm-Solver++:. arXiv preprint arXiv:2211.01095 , eprint =
-
[54]
Lu, Cheng and Zhou, Yuhao and Bao, Fan and Chen, Jianfei and Li, Chongxuan and Zhu, Jun , year =. Dpm-Solver:. Advances in Neural Information Processing Systems , volume =
-
[55]
Qi Mao, Hao Cheng, Tinghan Yang, Libiao Jin, and Siwei Ma
Ma, Nanye and Goldstein, Mark and Albergo, Michael S. and Boffi, Nicholas M. and. 2024 , month = sep, number =. doi:10.48550/arXiv.2401.08740 , urldate =. arXiv , keywords =:2401.08740 , primaryclass =
-
[56]
Okawa, Maya and Lubana, Ekdeep Singh and Dick, Robert P. and Tanaka, Hidenori , year =. Compositional. doi:10.48550/arXiv.2310.09336 , urldate =. arXiv , keywords =:2310.09336 , primaryclass =
-
[57]
Park, Core Francisco and Okawa, Maya and Lee, Andrew and Tanaka, Hidenori and Lubana, Ekdeep Singh , year =. Emergence of. doi:10.48550/arXiv.2406.19370 , urldate =. arXiv , keywords =:2406.19370 , primaryclass =
-
[59]
Xu, Yilun and Liu, Ziming and Tian, Yonglong and Tong, Shangyuan and Tegmark, Max and Jaakkola, Tommi , year =. arXiv e-prints , number =. doi:10.48550/arXiv.2302.04265 , adsnote =. arXiv , keywords =:2302.04265 , primaryclass =
-
[60]
Pierret, Emile and Galerne, Bruno , year =. Diffusion Models for. arXiv preprint arXiv:2405.14250 , eprint =
-
[61]
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J. , year =. Exploring the. doi:10.48550/arXiv.1910.10683 , urldate =. arXiv , keywords =:1910.10683 , primaryclass =
work page internal anchor Pith review doi:10.48550/arxiv.1910.10683 1910
-
[62]
High-Resolution Image Synthesis with Latent Diffusion Models , booktitle =
Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj. High-Resolution Image Synthesis with Latent Diffusion Models , booktitle =. 2022 , pages =
work page 2022
-
[63]
Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj. High-. 2022 , month = apr, number =. doi:10.48550/arXiv.2112.10752 , urldate =. arXiv , keywords =:2112.10752 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2112.10752 2022
-
[64]
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Saharia, Chitwan and Chan, William and Saxena, Saurabh and Li, Lala and Whang, Jay and Denton, Emily and Ghasemipour, Seyed Kamyar Seyed and Ayan, Burcu Karagol and Mahdavi, S. Sara and Lopes, Rapha Gontijo and Salimans, Tim and Ho, Jonathan and Fleet, David J. and Norouzi, Mohammad , year =. Photorealistic. doi:10.48550/arXiv.2205.11487 , urldate =. arXi...
work page internal anchor Pith review doi:10.48550/arxiv.2205.11487
-
[65]
Closed-form diffusion models.arXiv preprint arXiv:2310.12395, 2023
Closed-Form Diffusion Models , author =. 2023 , journal =. 2310.12395 , archiveprefix =
-
[66]
Learning Mixtures of Gaussians Using the Ddpm Objective , author =. 2023 , journal =
work page 2023
-
[67]
Sliced Score Matching: A Scalable Approach to Density and Score Estimation , booktitle =
Song, Yang and Garg, Sahaj and Shi, Jiaxin and Ermon, Stefano , editor =. Sliced Score Matching: A Scalable Approach to Density and Score Estimation , booktitle =. 2020 , month = jul, series =
work page 2020
-
[68]
Generative Modeling by Estimating Gradients of the Data Distribution , booktitle =
Song, Yang and Ermon, Stefano , editor =. Generative Modeling by Estimating Gradients of the Data Distribution , booktitle =. 2019 , volume =
work page 2019
-
[69]
Denoising Diffusion Implicit Models
Denoising Diffusion Implicit Models , author =. 2020 , journal =. 2010.02502 , archiveprefix =
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[70]
Score-Based Generative Modeling through Stochastic Differential Equations , booktitle =
Song, Yang and. Score-Based Generative Modeling through Stochastic Differential Equations , booktitle =
- [71]
-
[72]
A Connection between Score Matching and Denoising Autoencoders , author =. 2011 , journal =
work page 2011
-
[73]
A Geometric Analysis of Deep Generative Image Models and Its Applications , booktitle =
Wang, Binxu and Ponce, Carlos R , year =. A Geometric Analysis of Deep Generative Image Models and Its Applications , booktitle =
-
[74]
Diffusion Models Generate Images like Painters: An Analytical Theory of Outline First, Details Later , author =. 2023 , journal =. 2303.02490 , archiveprefix =
-
[75]
URL https: //openreview.net/forum?id=CD9Snc73AW
The Hidden Linear Structure in Score-Based Models and Its Application , author =. 2023 , month = nov, journal =. doi:10.48550/arXiv.2311.10892 , adsnote =. arXiv , keywords =:2311.10892 , primaryclass =
-
[76]
Wattenberg, Martin and Vi. Relational. 2024 , month = jul, number =. doi:10.48550/arXiv.2407.14662 , urldate =. arXiv , keywords =:2407.14662 , publisher =
-
[77]
Score-Based Generative Model Learn Manifold-like Structures with Constrained Mixing , booktitle =
Wenliang, Li Kevin and Moran, Ben , year =. Score-Based Generative Model Learn Manifold-like Structures with Constrained Mixing , booktitle =
-
[78]
Score-Based Generative Models Learn Manifold-like Structures with Constrained Mixing , author =. 2023 , journal =. 2311.09952 , archiveprefix =
-
[79]
Wu, Chen Henry and. Making Text-to-Image Diffusion Models Zero-Shot Image-to-Image Editors by Inferring ''random Seeds'' , booktitle =
-
[80]
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers
Xie, Enze and Chen, Junsong and Chen, Junyu and Cai, Han and Tang, Haotian and Lin, Yujun and Zhang, Zhekai and Li, Muyang and Zhu, Ligeng and Lu, Yao and Han, Song , year =. doi:10.48550/arXiv.2410.10629 , urldate =. arXiv , keywords =:2410.10629 , primaryclass =
work page internal anchor Pith review doi:10.48550/arxiv.2410.10629
-
[81]
Xu, Yilun and Liu, Ziming and Tegmark, Max and Jaakkola, Tommi S. , editor =. Poisson Flow Generative Models , booktitle =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.