The Emergence of Reproducibility and Generalizability in Diffusion Models

Huijie Zhang; Jinfan Zhou; Liyue Shen; Minzhe Guo; Peng Wang; Qing Qu; Yifu Lu

arxiv: 2310.05264 · v5 · pith:ETEZ4SW3new · submitted 2023-10-08 · 💻 cs.LG · cs.CV

The Emergence of Reproducibility and Generalizability in Diffusion Models

Huijie Zhang , Jinfan Zhou , Yifu Lu , Minzhe Guo , Peng Wang , Liyue Shen , Qing Qu This is my paper

classification 💻 cs.LG cs.CV

keywords diffusionmodelmodelstrainingdatadistributionreproducibilitydifferent

0 comments

read the original abstract

In this work, we investigate an intriguing and prevalent phenomenon of diffusion models which we term as "consistent model reproducibility": given the same starting noise input and a deterministic sampler, different diffusion models often yield remarkably similar outputs. We confirm this phenomenon through comprehensive experiments, implying that different diffusion models consistently reach the same data distribution and scoring function regardless of diffusion model frameworks, model architectures, or training procedures. More strikingly, our further investigation implies that diffusion models are learning distinct distributions affected by the training data size. This is supported by the fact that the model reproducibility manifests in two distinct training regimes: (i) "memorization regime", where the diffusion model overfits to the training data distribution, and (ii) "generalization regime", where the model learns the underlying data distribution. Our study also finds that this valuable property generalizes to many variants of diffusion models, including those for conditional use, solving inverse problems, and model fine-tuning. Finally, our work raises numerous intriguing theoretical questions for future investigation and highlights practical implications regarding training efficiency, model privacy, and the controlled generation of diffusion models.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Navigation to Refinement: Revealing the Two-Stage Nature of Flow-based Diffusion Models through Oracle Velocity
cs.LG 2025-12 conditional novelty 7.0

Flow matching models follow a two-stage process of navigation across data modes then refinement to nearest samples, revealed by exact computation of the oracle marginal velocity field.
The two clocks and the innovation window: When and how generative models learn rules
cs.LG 2026-05 unverdicted novelty 6.0

Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.
Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving
cs.RO 2026-02 unverdicted novelty 6.0

The paper introduces Hyper Diffusion Planner (HDP), a diffusion-based E2E AD framework that identifies insights on loss space, trajectory representation and data scaling, adds RL post-training, and reports 10x perform...