When Does Model Collapse Occur in Structured Interactive Learning?

Kangjie Zhou; Weijie Su; Yuchen Wu

arxiv: 2605.20151 · v1 · pith:KDXNZ6RWnew · submitted 2026-05-19 · 💻 cs.LG · math.ST· stat.TH

When Does Model Collapse Occur in Structured Interactive Learning?

Yuchen Wu , Kangjie Zhou , Weijie Su This is my paper

Pith reviewed 2026-05-20 06:27 UTC · model grok-4.3

classification 💻 cs.LG math.STstat.TH

keywords model collapseinteractive learningdirected graphssynthetic datagenerative modelsstatistical inferenceM-estimators

0 comments

The pith

Model collapse occurs exactly when the interaction graph satisfies a specific topological condition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Generative models increasingly train on synthetic outputs from other models, creating an interactive environment where data no longer come only from natural sources. This paper represents those interactions as a directed graph and shows that collapse depends on the graph's topology. It supplies an explicit necessary and sufficient condition that tells when collapse will happen. A reader would care because the condition offers a way to predict and possibly prevent progressive degradation in multi-model systems. The authors also supply finite-sample guarantees for linear regression and asymptotic results for general estimators.

Core claim

We formalize model interactions using directed graphs and derive an explicit necessary and sufficient condition characterizing when model collapse occurs. We further establish finite-sample results for linear regression and asymptotic guarantees for general M-estimators.

What carries the argument

Directed graph representation of model interactions, which encodes how synthetic outputs flow between models and determines collapse through its topology.

Load-bearing premise

Model interactions in the interactive learning environment can be accurately captured by a fixed directed graph topology without additional dynamics or feedback beyond the graph edges.

What would settle it

An experiment that runs models on an interaction graph satisfying the derived condition and checks whether collapse occurs or fails to occur.

Figures

Figures reproduced from arXiv: 2605.20151 by Kangjie Zhou, Weijie Su, Yuchen Wu.

**Figure 2.** Figure 2: An example of an interaction graph. In this example, [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Interaction graph that represents the accumulating training regime. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: The 5-node interaction graph that appears in Example [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: The interaction graph that appears in Example [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Interaction graphs considered in synthetic data experiment I. [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Plots of the risk ratios under the linear regression setting over the first 50 training cycles. The left [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: Plots of the risk ratios under the logistic regression setting over the first 50 training cycles. The [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Plots of the risk ratios under the non-convex single index model setting over the first 50 training [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison of risk ratios for models in the two interaction graphs of Example [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

**Figure 11.** Figure 11: FID ratios achieved by GANs trained on MNIST over 50 rounds in the interactive learning setting. [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗

**Figure 12.** Figure 12: FID ratios achieved by GANs trained on CIFAR-10 over 50 rounds in the interactive learning [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗

read the original abstract

The proliferation of generative artificial intelligence has given rise to an interactive learning environment, where model parameters are continuously updated using not only data generated by natural processes, but also synthetic outputs produced by other models. This paradigm introduces two major challenges: (1) training data are no longer drawn exclusively from the target population, undermining a core assumption of classical statistical learning, and (2) model training processes become inherently correlated, as models interact with one another through repeated exposure to each other's synthetic outputs in a potentially complex manner. Establishing reliable statistical inference in such structured interactive learning environments therefore remains an important open problem. In particular, there is growing concern about model collapse, a phenomenon in which the performance of generative models progressively degrades as they are trained on synthetic data produced by earlier model generations. Prior work on model collapse primarily focuses on a single model trained on its own output, failing to capture model performance in multi-model interactive settings. In this work, we fill this gap by investigating the performance of generative models in an interactive learning environment with general interaction patterns. In particular, we formalize model interactions using directed graphs and show that the occurrence of model collapse depends critically on the topology of the interaction graph. We further derive an explicit necessary and sufficient condition characterizing when model collapse occurs, and establish finite-sample results for linear regression and asymptotic guarantees for general M-estimators. We support our theoretical findings through extensive numerical experiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's graph-theoretic necessary and sufficient condition for model collapse in multi-model loops is the real addition, but it rests on a fixed directed graph that may not capture adaptive or time-varying interactions.

read the letter

The punchline is that this work moves beyond single-model collapse by modeling interactions as a directed graph and deriving an explicit necessary and sufficient condition on the topology for when collapse occurs. That condition, plus the finite-sample linear regression bounds and asymptotic M-estimator results, is what stands out from the abstract and prior literature they cite. The numerical experiments appear to back the theory in controlled settings, which is useful for checking the claims in practice. Credit to the authors for formalizing the multi-model case this way instead of staying with the simpler self-training loop. The main soft spot is the fixed-graph assumption. If real training lets models adaptively pick partners or introduces feedback that changes effective dependencies over time, the recurrence or contraction argument behind the condition could break, making necessity or sufficiency fail. The stress-test note flags exactly this, and without the full derivations visible here it's unclear how much slack the proofs have for that. Minor issues like data exclusion details or exact error bounds would also need checking in review, but they are not load-bearing. This paper is for researchers working on synthetic data pipelines and iterative model training, especially those who want a topological handle on collapse rather than just empirical warnings. A reader focused on theoretical guarantees in interactive learning would find the graph condition and regression results worth their time. I would send it to peer review. The core formalization is grounded enough to deserve referee scrutiny even if the static-topology limitation requires clarification or extension.

Referee Report

2 major / 3 minor

Summary. The paper investigates model collapse in interactive learning environments involving multiple generative models that train on each other's synthetic outputs. It formalizes these interactions using directed graphs and derives an explicit necessary and sufficient condition on the graph topology for when model collapse occurs. The work also establishes finite-sample results for linear regression, asymptotic guarantees for general M-estimators, and supports the findings with numerical experiments.

Significance. If the central results hold, this would represent a meaningful extension of model collapse analysis from isolated models to structured multi-model interactions. The graph-topology condition offers a concrete, topology-dependent characterization that could inform system design, while the finite-sample linear regression bounds and asymptotic M-estimator guarantees provide useful theoretical anchors. The numerical experiments add empirical support for the claims.

major comments (2)

[§3] §3 (derivation of the necessary and sufficient condition): The condition is derived under a fixed directed-graph topology that is assumed to be static and exhaustive of all dependencies. This assumption is load-bearing for the necessity and sufficiency claim; if real interactive learning permits adaptive partner selection or time-varying feedback that alters effective edges mid-training, the underlying recurrence or contraction mapping no longer matches the process and the condition loses its claimed status.
[§5.1] §5.1 (finite-sample linear regression results): The error bounds are stated to depend on graph topology, yet the explicit dependence (e.g., how the contraction factor or variance term scales with in-degree or strongly connected components) is not fully unpacked; without this, it is difficult to verify that the bounds remain informative for graphs that are only weakly connected.

minor comments (3)

[Abstract] The abstract would benefit from a one-sentence statement of the precise form of the necessary-and-sufficient condition (e.g., a spectral or connectivity criterion).
[Notation] Notation for the interaction graph G and the associated adjacency or Laplacian matrix should be introduced with an early illustrative figure to aid readability.
[Experiments] The numerical experiments section would be strengthened by reporting the precise data-generation process, number of runs, and any exclusion criteria for synthetic samples.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the work. We address each major comment below and indicate the revisions we intend to make.

read point-by-point responses

Referee: [§3] §3 (derivation of the necessary and sufficient condition): The condition is derived under a fixed directed-graph topology that is assumed to be static and exhaustive of all dependencies. This assumption is load-bearing for the necessity and sufficiency claim; if real interactive learning permits adaptive partner selection or time-varying feedback that alters effective edges mid-training, the underlying recurrence or contraction mapping no longer matches the process and the condition loses its claimed status.

Authors: We agree that the necessary and sufficient condition is derived under the modeling assumption of a fixed, static directed interaction graph. This assumption enables the precise recurrence formulation and contraction-mapping argument that yield the sharp topological characterization. The framework is intended to capture structured interactive settings with predetermined interaction patterns, which arise in many multi-model systems. We acknowledge that adaptive partner selection or time-varying edges would require a distinct analysis. In the revision we will add a clarifying remark in Section 3 on the scope of the assumption and identify dynamic-graph extensions as a natural direction for future work. revision: partial
Referee: [§5.1] §5.1 (finite-sample linear regression results): The error bounds are stated to depend on graph topology, yet the explicit dependence (e.g., how the contraction factor or variance term scales with in-degree or strongly connected components) is not fully unpacked; without this, it is difficult to verify that the bounds remain informative for graphs that are only weakly connected.

Authors: The finite-sample bounds are expressed via the contraction factor of the interaction matrix, which encodes the topological dependence. To improve transparency we will revise Section 5.1 to explicitly relate the contraction factor and variance terms to in-degree and the decomposition into strongly connected components. We will also add a short discussion of the bounds under weak connectivity, showing that they remain informative (though potentially slower) when the topological condition for collapse is not satisfied. These changes will make the scaling and applicability clearer. revision: yes

Circularity Check

0 steps flagged

Derivation of necessary and sufficient condition on interaction graph topology is mathematically independent

full rationale

The paper defines model interactions via a fixed directed graph, then performs analysis to obtain an explicit necessary and sufficient condition for model collapse along with finite-sample and asymptotic guarantees. No step reduces by construction to a fitted parameter renamed as prediction, a self-citation chain, or an ansatz smuggled from prior work by the same authors. The central claim is a derived property of the recurrence or contraction under the stated graph topology rather than a tautology or re-labeling of inputs. The derivation is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central modeling step is the representation of interactions as a directed graph; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Model interactions in the interactive learning environment can be formalized as a directed graph where edges represent use of synthetic outputs for training.
This is the key formalization step that enables the topological condition.

pith-pipeline@v0.9.0 · 5784 in / 1156 out tokens · 62073 ms · 2026-05-20T06:27:11.123579+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 6 internal anchors

[1]

2000 , publisher=

Asymptotic Statistics , author=. 2000 , publisher=

work page 2000
[2]

High dimensional probability II , pages=

Preservation theorems for Glivenko-Cantelli and uniform Glivenko-Cantelli classes , author=. High dimensional probability II , pages=. 2000 , publisher=

work page 2000
[3]

Weak convergence and empirical processes: with applications to statistics , pages=

Weak convergence , author=. Weak convergence and empirical processes: with applications to statistics , pages=. 1996 , publisher=

work page 1996
[4]

The 2023 Conference on Empirical Methods in Natural Language Processing , year=

Large Language Models Can Self-Improve , author=. The 2023 Conference on Empirical Methods in Natural Language Processing , year=

work page 2023
[5]

2009 , publisher=

The Elements of Statistical Learning: Data Mining, Inference, and Prediction , author=. 2009 , publisher=

work page 2009
[6]

On Discriminative vs

Ng, Andrew and Jordan, Michael , booktitle =. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , volume =

work page
[7]

Advances in neural information processing systems , volume=

Generative adversarial nets , author=. Advances in neural information processing systems , volume=

work page
[8]

Auto-Encoding Variational Bayes

Auto-encoding variational bayes , author=. arXiv preprint arXiv:1312.6114 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Proceedings of the 32nd International Conference on Machine Learning , pages =

Variational Inference with Normalizing Flows , author =. Proceedings of the 32nd International Conference on Machine Learning , pages =. 2015 , editor =

work page 2015
[10]

Attention is All you Need , volume =

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , volume =

work page
[11]

Advances in neural information processing systems , volume=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=

work page
[12]

2023 , url =

OpenAI , title =. 2023 , url =

work page 2023
[13]

2023 , url =

Google DeepMind , title =. 2023 , url =

work page 2023
[14]

Computer Science

Improving image generation with better captions , author=. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf , volume=

work page
[15]

The Twelfth International Conference on Learning Representations , year=

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis , author=. The Twelfth International Conference on Learning Representations , year=

work page
[16]

Self-Consuming Generative Models Go

Sina Alemohammad and Josue Casco-Rodriguez and Lorenzo Luzi and Ahmed Imtiaz Humayun and Hossein Babaei and Daniel LeJeune and Ali Siahkoohi and Richard Baraniuk , booktitle=. Self-Consuming Generative Models Go

work page
[17]

arXiv preprint arXiv:2306.07899 , year=

Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks , author=. arXiv preprint arXiv:2306.07899 , year=

work page arXiv
[18]

Proceedings of the 2nd Machine Learning for Healthcare Conference , pages =

Generating Multi-label Discrete Patient Records using Generative Adversarial Networks , author =. Proceedings of the 2nd Machine Learning for Healthcare Conference , pages =. 2017 , editor =

work page 2017
[19]

JMIR medical informatics , volume=

Reliability of supervised machine learning using synthetic data in health care: model to preserve privacy for data sharing , author=. JMIR medical informatics , volume=. 2020 , publisher=

work page 2020
[20]

Annals of internal medicine , volume=

Implications of the use of artificial intelligence predictive models in health care settings: a simulation study , author=. Annals of internal medicine , volume=. 2023 , publisher=

work page 2023
[21]

Machine learning for synthetic data generation: a review.arXiv preprint arXiv:2302.04062,

Machine learning for synthetic data generation: a review , author=. arXiv preprint arXiv:2302.04062 , year=

work page arXiv
[22]

Harrison Lee and Samrat Phatale and Hassan Mansoor and Kellie Ren Lu and Thomas Mesnard and Johan Ferret and Colton Bishop and Ethan Hall and Victor Carbune and Abhinav Rastogi , year=

work page
[23]

Distilling the Knowledge in a Neural Network

Distilling the knowledge in a neural network , author=. arXiv preprint arXiv:1503.02531 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[24]

A Survey on Knowledge Distillation of Large Language Models

A Survey on Knowledge Distillation of Large Language Models , author=. arXiv preprint arXiv:2402.13116 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[25]

Nature , year=

AI models collapse when trained on recursively generated data , author=. Nature , year=

work page
[26]

The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

Model Collapse Demystified: The Case of Regression , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

work page
[27]

The Twelfth International Conference on Learning Representations , year=

On the Stability of Iterative Retraining of Generative Models on their own Data , author=. The Twelfth International Conference on Learning Representations , year=

work page
[28]

2023 IEEE/CVF International Conference on Computer Vision (ICCV) , pages=

Will Large-scale Generative Models Corrupt Future Datasets? , author=. 2023 IEEE/CVF International Conference on Computer Vision (ICCV) , pages=

work page 2023
[29]

ACM computing surveys , volume=

Diffusion models: A comprehensive survey of methods and applications , author=. ACM computing surveys , volume=. 2023 , publisher=

work page 2023
[30]

Nature , volume=

Accurate structure prediction of biomolecular interactions with AlphaFold 3 , author=. Nature , volume=. 2024 , publisher=

work page 2024
[31]

2009 , publisher=

Learning multiple layers of features from tiny images , author=. 2009 , publisher=

work page 2009
[32]

arXiv preprint arXiv:2502.18049 , year=

Golden ratio weighting prevents model collapse , author=. arXiv preprint arXiv:2502.18049 , year=

work page arXiv
[33]

http://yann.lecun.com/exdb/mnist/ , year=

The MNIST database of handwritten digits , author=. http://yann.lecun.com/exdb/mnist/ , year=

work page
[34]

The Curse of Recursion: Training on Generated Data Makes Models Forget

The curse of recursion: Training on generated data makes models forget , author=. arXiv preprint arXiv:2305.17493 , year=

work page internal anchor Pith review arXiv
[35]

Forty-first International Conference on Machine Learning , year=

A Tale of Tails: Model Collapse as a Change of Scaling Laws , author=. Forty-first International Conference on Machine Learning , year=

work page
[36]

arXiv preprint arXiv:2509.22341 , year=

Preventing model collapse under overparametrization: Optimal mixing ratios for interpolation learning and ridge regression , author=. arXiv preprint arXiv:2509.22341 , year=

work page arXiv
[37]

Combining generative artificial intelligence (AI) and the internet: Heading towards evolution or degradation?,

Combining Generative Artificial Intelligence (AI) and the Internet: Heading towards Evolution or Degradation? , author=. arXiv preprint arXiv:2303.01255 , year=

work page arXiv
[38]

Epi UAI , year=

Towards Understanding the Interplay of Generative Artificial Intelligence and the Internet , author=. Epi UAI , year=

work page
[39]

arXiv preprint arXiv:2311.12202 , year=

Nepotistically Trained Generative-AI Models Collapse , author=. arXiv preprint arXiv:2311.12202 , year=

work page arXiv
[40]

Large language models suffer from their own output: An analysis of the self-consuming training loop,

Large Language Models Suffer From Their Own Output: An Analysis of the Self-Consuming Training Loop , author=. arXiv preprint arXiv:2311.16822 , year=

work page arXiv
[41]

Findings of the Association for Computational Linguistics: NAACL 2024 , pages=

The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text , author=. Findings of the Association for Computational Linguistics: NAACL 2024 , pages=

work page 2024
[42]

Proceedings of the 42nd International Conference on Machine Learning (ICML) , year=

Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World , author=. Proceedings of the 42nd International Conference on Machine Learning (ICML) , year=

work page
[43]

Roberts and Diyi Yang and David L

Matthias Gerstgrasser and Rylan Schaeffer and Apratim Dey and Rafael Rafailov and Henry Sleight and John Hughes and Tomasz Korbak and Rajashree Agrawal and Dhruv Pai and Andrey Gromov and Daniel A. Roberts and Diyi Yang and David L. Donoho and Sanmi Koyejo , booktitle=. Is Model Collapse Inevitable?

work page
[44]

The Thirteenth International Conference on Learning Representations , year=

Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification , author=. The Thirteenth International Conference on Learning Representations , year=

work page
[45]

The Thirteenth International Conference on Learning Representations , year=

Strong Model Collapse , author=. The Thirteenth International Conference on Learning Representations , year=

work page
[46]

The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

Scaling laws for learning with real and surrogate data , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

work page
[47]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context , author=. arXiv preprint arXiv:2403.05530 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[48]

Neural Information Processing Systems , year=

The Llama 3 herd of models , author=. Neural Information Processing Systems , year=

work page
[49]

GPT-4 Technical Report

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[50]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

work page
[51]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=

work page
[52]

2018 , publisher=

Improving language understanding by generative pre-training , author=. 2018 , publisher=

work page 2018
[53]

arXiv preprint arXiv:2503.03150 , year=

Position: Model collapse does not mean what you think , author=. arXiv preprint arXiv:2503.03150 , year=

work page arXiv
[54]

arXiv preprint arXiv:2504.08755 , year=

Delving into: The quantification of Ai-Generated content on the internet (Synthetic Data) , author=. arXiv preprint arXiv:2504.08755 , year=

work page arXiv
[55]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Are we in the AI-generated text world already? Quantifying and monitoring AIGT on social media , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

work page
[56]

Universality of the ^

Dey, Apratim and Donoho, David , journal=. Universality of the ^

work page
[57]

When Models Don't Collapse: On the Consistency of Iterative

Daniel Barzilai and Ohad Shamir , journal=. When Models Don't Collapse: On the Consistency of Iterative. 2025 , url=

work page 2025
[58]

First Conference on Language Modeling , year=

How bad is training on synthetic data? A statistical analysis of language model collapse , author=. First Conference on Language Modeling , year=

work page
[59]

arXiv preprint arXiv:2505.21677 , year=

What happens when generative AI models train recursively on each others' generated outputs? , author=. arXiv preprint arXiv:2505.21677 , year=

work page arXiv
[60]

The Annals of Probability , volume=

Covariance estimation for distributions with 2 + moments , author=. The Annals of Probability , volume=

work page

[1] [1]

2000 , publisher=

Asymptotic Statistics , author=. 2000 , publisher=

work page 2000

[2] [2]

High dimensional probability II , pages=

Preservation theorems for Glivenko-Cantelli and uniform Glivenko-Cantelli classes , author=. High dimensional probability II , pages=. 2000 , publisher=

work page 2000

[3] [3]

Weak convergence and empirical processes: with applications to statistics , pages=

Weak convergence , author=. Weak convergence and empirical processes: with applications to statistics , pages=. 1996 , publisher=

work page 1996

[4] [4]

The 2023 Conference on Empirical Methods in Natural Language Processing , year=

Large Language Models Can Self-Improve , author=. The 2023 Conference on Empirical Methods in Natural Language Processing , year=

work page 2023

[5] [5]

2009 , publisher=

The Elements of Statistical Learning: Data Mining, Inference, and Prediction , author=. 2009 , publisher=

work page 2009

[6] [6]

On Discriminative vs

Ng, Andrew and Jordan, Michael , booktitle =. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , volume =

work page

[7] [7]

Advances in neural information processing systems , volume=

Generative adversarial nets , author=. Advances in neural information processing systems , volume=

work page

[8] [8]

Auto-Encoding Variational Bayes

Auto-encoding variational bayes , author=. arXiv preprint arXiv:1312.6114 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Proceedings of the 32nd International Conference on Machine Learning , pages =

Variational Inference with Normalizing Flows , author =. Proceedings of the 32nd International Conference on Machine Learning , pages =. 2015 , editor =

work page 2015

[10] [10]

Attention is All you Need , volume =

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , volume =

work page

[11] [11]

Advances in neural information processing systems , volume=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=

work page

[12] [12]

2023 , url =

OpenAI , title =. 2023 , url =

work page 2023

[13] [13]

2023 , url =

Google DeepMind , title =. 2023 , url =

work page 2023

[14] [14]

Computer Science

Improving image generation with better captions , author=. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf , volume=

work page

[15] [15]

The Twelfth International Conference on Learning Representations , year=

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis , author=. The Twelfth International Conference on Learning Representations , year=

work page

[16] [16]

Self-Consuming Generative Models Go

Sina Alemohammad and Josue Casco-Rodriguez and Lorenzo Luzi and Ahmed Imtiaz Humayun and Hossein Babaei and Daniel LeJeune and Ali Siahkoohi and Richard Baraniuk , booktitle=. Self-Consuming Generative Models Go

work page

[17] [17]

arXiv preprint arXiv:2306.07899 , year=

Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks , author=. arXiv preprint arXiv:2306.07899 , year=

work page arXiv

[18] [18]

Proceedings of the 2nd Machine Learning for Healthcare Conference , pages =

Generating Multi-label Discrete Patient Records using Generative Adversarial Networks , author =. Proceedings of the 2nd Machine Learning for Healthcare Conference , pages =. 2017 , editor =

work page 2017

[19] [19]

JMIR medical informatics , volume=

Reliability of supervised machine learning using synthetic data in health care: model to preserve privacy for data sharing , author=. JMIR medical informatics , volume=. 2020 , publisher=

work page 2020

[20] [20]

Annals of internal medicine , volume=

Implications of the use of artificial intelligence predictive models in health care settings: a simulation study , author=. Annals of internal medicine , volume=. 2023 , publisher=

work page 2023

[21] [21]

Machine learning for synthetic data generation: a review.arXiv preprint arXiv:2302.04062,

Machine learning for synthetic data generation: a review , author=. arXiv preprint arXiv:2302.04062 , year=

work page arXiv

[22] [22]

Harrison Lee and Samrat Phatale and Hassan Mansoor and Kellie Ren Lu and Thomas Mesnard and Johan Ferret and Colton Bishop and Ethan Hall and Victor Carbune and Abhinav Rastogi , year=

work page

[23] [23]

Distilling the Knowledge in a Neural Network

Distilling the knowledge in a neural network , author=. arXiv preprint arXiv:1503.02531 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[24] [24]

A Survey on Knowledge Distillation of Large Language Models

A Survey on Knowledge Distillation of Large Language Models , author=. arXiv preprint arXiv:2402.13116 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[25] [25]

Nature , year=

AI models collapse when trained on recursively generated data , author=. Nature , year=

work page

[26] [26]

The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

Model Collapse Demystified: The Case of Regression , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

work page

[27] [27]

The Twelfth International Conference on Learning Representations , year=

On the Stability of Iterative Retraining of Generative Models on their own Data , author=. The Twelfth International Conference on Learning Representations , year=

work page

[28] [28]

2023 IEEE/CVF International Conference on Computer Vision (ICCV) , pages=

Will Large-scale Generative Models Corrupt Future Datasets? , author=. 2023 IEEE/CVF International Conference on Computer Vision (ICCV) , pages=

work page 2023

[29] [29]

ACM computing surveys , volume=

Diffusion models: A comprehensive survey of methods and applications , author=. ACM computing surveys , volume=. 2023 , publisher=

work page 2023

[30] [30]

Nature , volume=

Accurate structure prediction of biomolecular interactions with AlphaFold 3 , author=. Nature , volume=. 2024 , publisher=

work page 2024

[31] [31]

2009 , publisher=

Learning multiple layers of features from tiny images , author=. 2009 , publisher=

work page 2009

[32] [32]

arXiv preprint arXiv:2502.18049 , year=

Golden ratio weighting prevents model collapse , author=. arXiv preprint arXiv:2502.18049 , year=

work page arXiv

[33] [33]

http://yann.lecun.com/exdb/mnist/ , year=

The MNIST database of handwritten digits , author=. http://yann.lecun.com/exdb/mnist/ , year=

work page

[34] [34]

The Curse of Recursion: Training on Generated Data Makes Models Forget

The curse of recursion: Training on generated data makes models forget , author=. arXiv preprint arXiv:2305.17493 , year=

work page internal anchor Pith review arXiv

[35] [35]

Forty-first International Conference on Machine Learning , year=

A Tale of Tails: Model Collapse as a Change of Scaling Laws , author=. Forty-first International Conference on Machine Learning , year=

work page

[36] [36]

arXiv preprint arXiv:2509.22341 , year=

Preventing model collapse under overparametrization: Optimal mixing ratios for interpolation learning and ridge regression , author=. arXiv preprint arXiv:2509.22341 , year=

work page arXiv

[37] [37]

Combining generative artificial intelligence (AI) and the internet: Heading towards evolution or degradation?,

Combining Generative Artificial Intelligence (AI) and the Internet: Heading towards Evolution or Degradation? , author=. arXiv preprint arXiv:2303.01255 , year=

work page arXiv

[38] [38]

Epi UAI , year=

Towards Understanding the Interplay of Generative Artificial Intelligence and the Internet , author=. Epi UAI , year=

work page

[39] [39]

arXiv preprint arXiv:2311.12202 , year=

Nepotistically Trained Generative-AI Models Collapse , author=. arXiv preprint arXiv:2311.12202 , year=

work page arXiv

[40] [40]

Large language models suffer from their own output: An analysis of the self-consuming training loop,

Large Language Models Suffer From Their Own Output: An Analysis of the Self-Consuming Training Loop , author=. arXiv preprint arXiv:2311.16822 , year=

work page arXiv

[41] [41]

Findings of the Association for Computational Linguistics: NAACL 2024 , pages=

The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text , author=. Findings of the Association for Computational Linguistics: NAACL 2024 , pages=

work page 2024

[42] [42]

Proceedings of the 42nd International Conference on Machine Learning (ICML) , year=

Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World , author=. Proceedings of the 42nd International Conference on Machine Learning (ICML) , year=

work page

[43] [43]

Roberts and Diyi Yang and David L

Matthias Gerstgrasser and Rylan Schaeffer and Apratim Dey and Rafael Rafailov and Henry Sleight and John Hughes and Tomasz Korbak and Rajashree Agrawal and Dhruv Pai and Andrey Gromov and Daniel A. Roberts and Diyi Yang and David L. Donoho and Sanmi Koyejo , booktitle=. Is Model Collapse Inevitable?

work page

[44] [44]

The Thirteenth International Conference on Learning Representations , year=

Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification , author=. The Thirteenth International Conference on Learning Representations , year=

work page

[45] [45]

The Thirteenth International Conference on Learning Representations , year=

Strong Model Collapse , author=. The Thirteenth International Conference on Learning Representations , year=

work page

[46] [46]

The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

Scaling laws for learning with real and surrogate data , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

work page

[47] [47]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context , author=. arXiv preprint arXiv:2403.05530 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[48] [48]

Neural Information Processing Systems , year=

The Llama 3 herd of models , author=. Neural Information Processing Systems , year=

work page

[49] [49]

GPT-4 Technical Report

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[50] [50]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

work page

[51] [51]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=

work page

[52] [52]

2018 , publisher=

Improving language understanding by generative pre-training , author=. 2018 , publisher=

work page 2018

[53] [53]

arXiv preprint arXiv:2503.03150 , year=

Position: Model collapse does not mean what you think , author=. arXiv preprint arXiv:2503.03150 , year=

work page arXiv

[54] [54]

arXiv preprint arXiv:2504.08755 , year=

Delving into: The quantification of Ai-Generated content on the internet (Synthetic Data) , author=. arXiv preprint arXiv:2504.08755 , year=

work page arXiv

[55] [55]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Are we in the AI-generated text world already? Quantifying and monitoring AIGT on social media , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

work page

[56] [56]

Universality of the ^

Dey, Apratim and Donoho, David , journal=. Universality of the ^

work page

[57] [57]

When Models Don't Collapse: On the Consistency of Iterative

Daniel Barzilai and Ohad Shamir , journal=. When Models Don't Collapse: On the Consistency of Iterative. 2025 , url=

work page 2025

[58] [58]

First Conference on Language Modeling , year=

How bad is training on synthetic data? A statistical analysis of language model collapse , author=. First Conference on Language Modeling , year=

work page

[59] [59]

arXiv preprint arXiv:2505.21677 , year=

What happens when generative AI models train recursively on each others' generated outputs? , author=. arXiv preprint arXiv:2505.21677 , year=

work page arXiv

[60] [60]

The Annals of Probability , volume=

Covariance estimation for distributions with 2 + moments , author=. The Annals of Probability , volume=

work page