Diffusion Models Memorize in Training -- and Generalize in Inference

Markus Kollmann; Tim Kaiser

arxiv: 2603.13419 · v2 · pith:XTYU6EUZnew · submitted 2026-03-12 · 💻 cs.LG

Diffusion Models Memorize in Training -- and Generalize in Inference

Tim Kaiser , Markus Kollmann This is my paper

Pith reviewed 2026-05-21 10:48 UTC · model grok-4.3

classification 💻 cs.LG

keywords diffusion modelsmemorizationgeneralizationdenoising objectiveoverfittingflow fieldsampling trajectories

0 comments

The pith

Diffusion models overfit the denoising objective but generalize in inference because model error shifts sampling trajectories away from training data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that diffusion models progressively overfit their denoising training objective, creating a clear performance gap between training and validation samples that peaks at intermediate noise levels. Despite this memorization during training, the models still produce diverse outputs at inference time. Through analysis of a fully analytic error-prone toy model, the authors trace how the optimal flow field would sharply localize around training points, yet model error smooths this into a generalizing field. The training gap fails to produce overfitting at inference because sampling trajectories remain distant from the distribution of noisy training samples.

Core claim

The flow field generalizes through model error, which moves sampling trajectories outside the domain of noisy training samples and thereby naturally prevents overfitting, even as the model fully memorizes the training data in the denoising objective.

What carries the argument

The denoising flow field, which localizes sharply around training points in its optimal form but is smoothed by model error into a generalizing version.

If this is right

The generalization gap between training and validation performance is largest at intermediate noise levels.
Model error suppresses exact recall of individual training points and produces a smooth flow field instead.
The training generalization gap does not carry over to inference time.
Generated samples show no strong similarity to training samples despite the objective-level overfitting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same error-driven smoothing might stabilize other generative models whose training objectives differ from their inference paths.
Controlled amounts of model error could be deliberately introduced during training to improve generalization without changing the architecture.
Quantifying trajectory distances in large-scale diffusion models would test how far this separation holds in practice.

Load-bearing premise

The intermediate states of sampling trajectories are sufficiently far from the distribution of noisy training samples the model is trained on.

What would settle it

Measuring the distance of intermediate sampling states to the nearest noisy training samples and checking whether this distance correlates with increased similarity between generated and training samples.

Figures

Figures reproduced from arXiv: 2603.13419 by Markus Kollmann, Tim Kaiser.

**Figure 1.** Figure 1: Generalization gap and overfitting for ED(σ) at σ ≈ 1.67. We define overfitting as a decrease in validation performance while training performance improves. we find that state-of-the-art denoiser models exhibit a significant relative generalization gap between training and validation data ( [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: The relative generalization gap (Eval − Etrain)/Etrain increases with images seen during training (colorbar in millions) and with model size in relation to dataset size. The black line indicates the beginning of overfitting (validation error starts increasing, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Flow field lines and predictions of the optimal target predictor y ∗ (x, σ) at different noise levels σ = 28, 2.8, 0.63. Color indicates the magnitude of the prediction error y ∗ (x, σ)−x. Field geometry is (a) global, predictions tend towards the superposition of all training points, (b) moderately localized around training points, predictions approximate the data manifold, and (c) highly localized aroun… view at source ↗

**Figure 4.** Figure 4: Relative generalization gap (Eval − Etrain)/Etrain in our 2D toy model for different settings of the model error parameter δ (colorbar). Analyzing the behavior of yδ at different regimes of the ratio σ/δ explains why the peak emerges: At high noise (σ ≫ δ), yδ behaves like the posterior mean predictor y ∗ , but noisy samples x are far away from the data manifold. Predictions tend towards the superpositi… view at source ↗

**Figure 5.** Figure 5: Filled contours showing the relative generalization gap (E − Etrain)/Etrain at intermediate noise (σ ≈ 1.1) between the training points and each point in space, respectively. The black lines enclose the area where this gap is 0.5 or less. As the model error δ decreases, this region shrinks until it no longer contains the validation points, indicating that the generalization gap to the validation set is lar… view at source ↗

**Figure 6.** Figure 6: Relative generalization gaps (Mval−Mtrain)/Mtrain vs training time and model capacity for various metrics on ImageNet-64. Approximately linearly scaling for onestep reconstruction metrics, using (a) L2-norm to compare paired samples, (b) Fréchet Distance to compare distributions of paired samples. (c) Relative generalization gaps using tFD (FDD) as a metric, comparing the output of denoising trajectories … view at source ↗

**Figure 7.** Figure 7: Metric results for Mtrain and Mval for various metrics on ImageNet-64. (a),(b) For reconstruction-based metrics, the training performance improves with training time and model size, whereas the validation performance eventually degrades, resulting in classical overfit behavior. (c) With FDD, overfitting is absent, because training performance degrades as well [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: (a) The relative generalization gap (Mval − Mtrain)/Mtrain in pixel space increases and shifts to the right when the number of classes (colorbar) for a classconditioned model increases. (b) In feature space, this trend is mostly mitigated, and (c) stays absent with Fréchet Distance-based metrics. σ fixed near the peak of the gap [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Relative generalization gaps (Mval − Mtrain)/Mtrain for Autoguidance and CFG with EDM2-S on ImageNet-64 with varying guidance weights. w = 0 corresponds to no guidance. Reconstruction-based metrics fix σ near the peak in the generalization gap. a moderate yet steady increase in the generalization gap as the guidance weight increases, given that the primary and auxiliary models differ only in their model er… view at source ↗

**Figure 10.** Figure 10: (a)-(b) Relative generalization gap (Mval − Mtrain)/Mtrain and (c)-(d) training and validation results Mtrain/val, with EDM2-M on ImageNet-64 with tFD and truncated inference. The indices in the legend show at which step inference was stopped, with 16, 17, 18, 19, 20 corresponding to intermediate noise levels σ ≈ 2.2, 1.6, 1.2, 0.8, 0.6, comparable to the peak in the relative generalization gap for the re… view at source ↗

**Figure 11.** Figure 11: (a) Decreasing the receptive field mitigates the relative generalization gap (Eval − Etrain)/Etrain. Colorbar shows the sliding window size. 64 corresponds to regular inference. (b) During inference, the model shifts from long-range correlations to short-range correlations. The correlation score is averaged over pixels and images. (c) Edge pixels tend to have longer-range dependencies. σ ≈ 1.2. RGB shows … view at source ↗

**Figure 12.** Figure 12: Edge pixels tend to have longer-range dependencies. Shown is the pixel-wise correlation score, averaged over 128 images, computed with EDM2-S on ImageNet-64 at σ = 1. CS_{ij} = \frac {\frac {1}{N}\sum _{k, l} d(i,j,k, l)\frac {\partial D(\x )_{ij}}{\partial \y _{kl}}}{\frac {1}{N} \sum _{k, l} \frac {\partial D(\x )_{ij}}{\partial \y _{kl}}} , \quad \x = \y + \sigma \bm \eta with d(i, j, k, l) = [PITH_FU… view at source ↗

**Figure 13.** Figure 13: Relative generalization gap (Eval − Etrain)/Etrain with EDM2 on ImageNet64 for various model sizes. Black lines indicate σ values used in [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗

**Figure 14.** Figure 14: Relative generalization gap (Mval − Mtrain)/Mtrain with EDM2-M on ImageNet-64 for various metrics. Black lines indicate σ values used in [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗

**Figure 15.** Figure 15: Relative generalization gap (Mval−Mtrain)/Mtrain with EDM2 on ImageNet64 for various metrics and model sizes. For reconstruction-based metrics, σ is fixed at the peak of the generalization gap, see [PITH_FULL_IMAGE:figures/full_fig_p022_15.png] view at source ↗

**Figure 16.** Figure 16: Training and validation results Mtrain/val with EDM2 on ImageNet-64 for various metrics and model sizes. For reconstruction-based metrics, σ is fixed at the peak of the generalization gap, see [PITH_FULL_IMAGE:figures/full_fig_p023_16.png] view at source ↗

**Figure 17.** Figure 17: Relative generalization gap (Eval − Etrain)/Etrain with EDM2 on ImageNet512 for various model sizes. Black lines indicate σ values used in [PITH_FULL_IMAGE:figures/full_fig_p024_17.png] view at source ↗

**Figure 18.** Figure 18: Relative generalization gap (Mval − Mtrain)/Mtrain with EDM2-M on ImageNet-512 for various metrics. Black lines indicate σ values used in [PITH_FULL_IMAGE:figures/full_fig_p025_18.png] view at source ↗

**Figure 19.** Figure 19: (a)-(c) Relative generalization gap (Mval−Mtrain)/Mtrain and (d)-(f) training and validation results Mtrain/val, with EDM2 on ImageNet-512 for various metrics and model sizes. ForrL2pix, σ is fixed at the peak of the generalization gap, see [PITH_FULL_IMAGE:figures/full_fig_p025_19.png] view at source ↗

**Figure 20.** Figure 20: Relative generalization gap (Mval − Mtrain)/Mtrain with EDM on CIFAR-10 for various metrics. Black lines indicate σ values used in [PITH_FULL_IMAGE:figures/full_fig_p026_20.png] view at source ↗

**Figure 21.** Figure 21: Relative generalization gap (Mval−Mtrain)/Mtrain with EDM on CIFAR-100 for various metrics. Black lines indicate σ values used in [PITH_FULL_IMAGE:figures/full_fig_p026_21.png] view at source ↗

**Figure 22.** Figure 22: Relative generalization gap (Mval − Mtrain)/Mtrain with EDM on CIFAR10/100 for various metrics and model sizes. For reconstruction-based metrics, σ is fixed at the peak of the generalization gap, see [PITH_FULL_IMAGE:figures/full_fig_p027_22.png] view at source ↗

**Figure 23.** Figure 23: Training and validation results Mtrain/val with EDM on CIFAR-10/100 for various metrics and model sizes. For reconstruction-based metrics, σ is fixed at the peak of the generalization gap, see [PITH_FULL_IMAGE:figures/full_fig_p028_23.png] view at source ↗

**Figure 24.** Figure 24: Flow field lines and predictions of the optimal target predictor y ∗ (x, σ) at different noise levels σ = 28, 2.8, 0.63. Color indicates the magnitude of the prediction error y ∗ (x, σ)−x. Field geometry is (a) global, predictions tend towards the superposition of all training points. (b) moderately localized around training points, predictions approximate the data manifold (c) highly localized around ea… view at source ↗

**Figure 25.** Figure 25: Relative generalization gap (Eval − Etrain)/Etrain in our 2D toy model with random data and (a) different settings of the model error parameter δ, (b) different number of training points, and (c) different settings for the guidance weight w. w = 0 corresponds to no guidance. (a) δ = 2.0 (b) δ = 1.6 (c) δ = 1.2 [PITH_FULL_IMAGE:figures/full_fig_p030_25.png] view at source ↗

**Figure 26.** Figure 26: Filled contours showing the relative generalization gap (E − Etrain)/Etrain at intermediate noise (σ ≈ 1.1) between the training points and each point in space, respectively. The black lines enclose the area where this gap is 0.5 or less. As the model error δ decreases, this region shrinks until it no longer contains the validation points, indicating that the generalization gap to the validation set is la… view at source ↗

**Figure 27.** Figure 27: (a)-(b) Relative generalization gap (Mval − Mtrain)/Mtrain for various metrics and (c)-(d) training and validation results Mtrain/val with FDD/FID, with EDM on CIFAR-10. We retrained for 100Mimg (half of the default) with different numbers of classes. Classes were computed algorithmically using k-means clustering in DINOv2 feature space. For reconstruction-based metrics, σ is fixed at the peak of the gen… view at source ↗

**Figure 28.** Figure 28: Relative generalization gaps (Mval −Mtrain)/Mtrain with Autoguidance with EDM2-S on ImageNet-64. Reconstruction-based metrics fix σ at the peak of the generalization gap (see [PITH_FULL_IMAGE:figures/full_fig_p032_28.png] view at source ↗

**Figure 29.** Figure 29: Relative generalization gaps (Mval−Mtrain)/Mtrain with Classifier-Free Guidance with EDM2-S on ImageNet-64. Reconstruction-based metrics fix σ at the peak of the generalization gap (see [PITH_FULL_IMAGE:figures/full_fig_p032_29.png] view at source ↗

**Figure 30.** Figure 30: Relative generalization gaps (Mval −Mtrain)/Mtrain with Autoguidance with EDM2-S on ImageNet-64. Black lines indicate σ values used in [PITH_FULL_IMAGE:figures/full_fig_p033_30.png] view at source ↗

**Figure 31.** Figure 31: Relative generalization gaps (Mval−Mtrain)/Mtrain with Classifier-Free Guidance with EDM2-S on ImageNet-64. Black lines indicate σ values used in [PITH_FULL_IMAGE:figures/full_fig_p033_31.png] view at source ↗

read the original abstract

Diffusion models generalize well in practice. However, an optimal diffusion model fully memorizes the training data and therefore fails to generalize, raising the question of what induces generalization in a real diffusion model. We show that, despite generalizing at the sample level, diffusion models progressively overfit the denoising training objective and thereby create a generalization gap between the performance on validation and training samples. This gap is most pronounced at intermediate noise levels. Using a fully analytic error-prone toy model, we trace the factors affecting the generalization gap. We find that the optimal denoising flow field localizes sharply around training points, but the model error suppresses the exact recall of training points, yielding a smooth, generalizing flow field. Finally, we find that the generalization gap observed in training does not translate to inference, which would result in a strong similarity between generated samples and training samples. This is because the intermediate states of sampling trajectories are sufficiently far from the distribution of noisy training samples the model is trained on. Together, these findings reveal a novel picture of how diffusion models generalize: the flow field generalizes through model error, which moves sampling trajectories outside the domain of noisy training samples and thereby naturally prevents overfitting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper uses a toy model to show how model error smooths the learned flow field and keeps sampling trajectories away from noisy training points, which could explain generalization despite training overfitting, but the distance assumption needs checking against real neural nets.

read the letter

The main point is that diffusion models overfit the denoising objective at intermediate noise levels yet still produce new samples at inference because their errors prevent exact recall and push trajectories outside the memorized regions. The authors build a fully analytic toy model with an error term to derive how the optimal flow localizes sharply around training points but gets smoothed by that error, creating a measurable gap between train and validation denoising performance. This part is clean and lets you see the mechanism without the usual fitting complications of real networks. They then argue that the observed training gap does not cause overfitting during sampling because the intermediate states along reverse trajectories stay sufficiently far from the distribution of noisy training samples. That separation is shown in the toy setting and gives a concrete picture of generalization arising from imperfect learning rather than from the objective itself. What the work does well is separate training behavior from inference behavior in a reproducible way and tie it back to the flow-field geometry. The math is straightforward for the toy case and avoids circularity on the localization and smoothing steps. The soft spot is the leap from the toy error model to high-capacity neural denoisers. The paper does not appear to provide quantitative distance measurements or trajectory statistics on actual diffusion networks, so it is not yet clear whether real sampling paths remain outside the support of the noisy training data when the error structure is more complex. If trajectories can enter regions where the learned score closely matches memorized points, the proposed mechanism would not hold. This is aimed at people working on theoretical accounts of generalization in score-based models. A reader who wants a mechanistic story for why diffusion models avoid obvious memorization would get value from the toy analysis. I would send it to peer review because the core idea is worth testing even if the real-model validation needs more work.

Referee Report

2 major / 2 minor

Summary. The paper claims that diffusion models progressively overfit the denoising training objective despite generalizing at the sample level, creating a generalization gap most pronounced at intermediate noise levels. Using a fully analytic error-prone toy model, it traces how the optimal denoising flow field localizes sharply around training points but model error suppresses exact recall to yield a smooth generalizing flow field. The authors conclude that the training generalization gap does not translate to inference because intermediate states of sampling trajectories remain sufficiently far from the distribution of noisy training samples, revealing that the flow field generalizes through model error which naturally prevents overfitting.

Significance. If the result holds, the work offers a mechanistic account of generalization in diffusion models that credits model error for smoothing the flow field and separating inference trajectories from memorized noisy data. The fully analytic toy model is a clear strength, providing reproducible derivations and a falsifiable picture that could inform training practices and architectural choices in the field.

major comments (2)

[Toy Model Analysis] The construction of the analytic error-prone toy model, its specific error model, and the mapping from toy error structure to high-capacity neural denoisers are not detailed sufficiently. This leaves derivation gaps in verifying the localization of the optimal flow field and the independent smoothing effect of error, which are load-bearing for the central claim that model error induces generalization.
[Inference and Generalization] The inference claim that sampling trajectories remain outside the support of noisy training samples rests on an untested distance assumption extrapolated from the toy model. Without quantitative checks in neural-network regimes, this premise is insufficient to establish that the observed training/validation gap at intermediate noise levels does not produce overfitting at inference time.

minor comments (2)

The abstract would benefit from one or two quantitative statements (e.g., measured gap sizes or distance statistics from the toy model) to make the claims more concrete.
[Toy Model Analysis] Notation for the error term and flow-field localization in the toy-model derivations could be introduced more explicitly to aid readers.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their positive assessment of the work's significance and for highlighting the value of the fully analytic toy model. We address each major comment below with point-by-point responses, indicating where revisions will be made to strengthen clarity and support for the claims.

read point-by-point responses

Referee: [Toy Model Analysis] The construction of the analytic error-prone toy model, its specific error model, and the mapping from toy error structure to high-capacity neural denoisers are not detailed sufficiently. This leaves derivation gaps in verifying the localization of the optimal flow field and the independent smoothing effect of error, which are load-bearing for the central claim that model error induces generalization.

Authors: We agree that the toy model section would benefit from greater explicitness to facilitate independent verification. In the revised manuscript we will expand the relevant section and add a dedicated appendix containing: (i) the complete step-by-step derivation of the optimal denoising flow field and its localization around training points, (ii) the precise mathematical definition of the error model (including how additive perturbations to the score are introduced and scaled with noise level), and (iii) an explicit discussion mapping the toy error structure onto high-capacity neural denoisers by linking finite optimization and capacity constraints to analogous smoothing behavior. These additions will close the noted derivation gaps while preserving the analytic character of the model. revision: yes
Referee: [Inference and Generalization] The inference claim that sampling trajectories remain outside the support of noisy training samples rests on an untested distance assumption extrapolated from the toy model. Without quantitative checks in neural-network regimes, this premise is insufficient to establish that the observed training/validation gap at intermediate noise levels does not produce overfitting at inference time.

Authors: The distance claim follows directly from the analytic smoothing of the flow field derived in the toy model: once model error is present, the resulting vector field steers trajectories away from the localized support of noisy training points at intermediate noise levels. We acknowledge that the current manuscript does not supply quantitative distance measurements or trajectory analyses performed with actual neural-network denoisers. In revision we will add a discussion paragraph outlining how the assumption could be tested empirically (e.g., via latent-space distance statistics or controlled low-dimensional network experiments) and will note this as an important direction for follow-up work. We maintain, however, that the toy-model derivation already supplies a mechanistic, falsifiable account consistent with the observed training/validation gap; the absence of large-scale numerical checks does not invalidate the analytic insight but does limit the strength of the extrapolation. revision: partial

standing simulated objections not resolved

Quantitative validation of the sampling-trajectory distance assumption in high-capacity neural-network regimes

Circularity Check

0 steps flagged

No significant circularity; toy-model derivations are independent of inference claim

full rationale

The paper's core chain relies on a fully analytic error-prone toy model to derive localization of the optimal denoising flow field around training points and the smoothing effect of model error, both obtained independently via explicit equations rather than by redefining inputs or fitting. The subsequent claim that inference trajectories remain outside the support of noisy training samples follows directly from this error-induced smoothing in the toy setting, without reducing to a self-citation, a fitted parameter renamed as prediction, or an ansatz smuggled through prior work. No load-bearing step equates to its own inputs by construction, and the analysis remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the representativeness of the analytic toy model for real diffusion networks and on the unverified premise that sampling trajectories remain distant from noisy training samples; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)

domain assumption An analytic error-prone toy model captures the essential localization and smoothing behavior of the denoising flow field in real diffusion models.
Invoked to trace factors affecting the generalization gap between training and validation samples.
domain assumption Intermediate states along sampling trajectories lie sufficiently far from the distribution of noisy training samples.
Used to conclude that the training-time generalization gap does not produce overfitting at inference.

pith-pipeline@v0.9.0 · 5732 in / 1466 out tokens · 59094 ms · 2026-05-21T10:48:38.804789+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · 10 internal anchors

[1]

Achilli, B., Ventura, E., Silvestri, G., Pham, B., Raya, G., Krotov, D., Lucibello, C., Ambrogioni,L.:Losingdimensions:Geometricmemorizationingenerativediffusion (2024),https://arxiv.org/abs/2410.08727

work page arXiv 2024
[2]

Adaloglou, N., Kaiser, T., Iagudin, D., Kollmann, M.: Guiding a diffusion model using sliding windows (2025),https://arxiv.org/abs/2411.10257

work page arXiv 2025
[3]

org/abs/2403.00570 Diffusion Models Generalize but Not in the Way You Might Think 13

Adaloglou, N., Kaiser, T., Michels, F., Kollmann, M.: Rethinking cluster- conditioned diffusion models for label-free image synthesis (2024),https://arxiv. org/abs/2403.00570 Diffusion Models Generalize but Not in the Way You Might Think 13

work page arXiv 2024
[4]

Ahn, D., Cho, H., Min, J., Jang, W., Kim, J., Kim, S., Park, H.H., Jin, K.H., Kim, S.: Self-rectifying diffusion sampling with perturbed-attention guidance (2025), https://arxiv.org/abs/2403.17377

work page arXiv 2025
[5]

Nature Communications15(1) (Nov 2024).https://doi.org/10.1038/ s41467-024-54281-3,http://dx.doi.org/10.1038/s41467-024-54281-3

Biroli, G., Bonnaire, T., de Bortoli, V., Mézard, M.: Dynamical regimes of diffusion models. Nature Communications15(1) (Nov 2024).https://doi.org/10.1038/ s41467-024-54281-3,http://dx.doi.org/10.1038/s41467-024-54281-3

work page doi:10.1038/s41467-024-54281-3 2024
[6]

Blattmann, A., Dockhorn, T., Kulal, S., Mendelevitch, D., Kilian, M., Lorenz, D., Levi, Y., English, Z., Voleti, V., Letts, A., Jampani, V., Rombach, R.: Stable video diffusion: Scaling latent video diffusion models to large datasets (2023),https: //arxiv.org/abs/2311.15127

work page internal anchor Pith review Pith/arXiv arXiv 2023
[7]

Bonnaire, T., Urfin, R., Biroli, G., Mézard, M.: Why diffusion models don’t mem- orize: The role of implicit dynamical regularization in training (2025),https: //arxiv.org/abs/2505.17638

work page arXiv 2025
[8]

Buchanan, S., Pai, D., Ma, Y., Bortoli, V.D.: On the edge of memorization in diffusion models (2025),https://arxiv.org/abs/2508.17689

work page arXiv 2025
[9]

van den Burg, G.J.J., Williams, C.K.I.: On memorization in probabilistic deep generative models (2021),https://arxiv.org/abs/2106.03216

work page arXiv 2021
[10]

Carlini, N., Hayes, J., Nasr, M., Jagielski, M., Sehwag, V., Tramèr, F., Balle, B., Ippolito, D., Wallace, E.: Extracting training data from diffusion models (2023), https://arxiv.org/abs/2301.13188

work page arXiv 2023
[11]

Chen, C., Liu, D., Shah, M., Xu, C.: Exploring local memorization in diffusion models via bright ending attention (2025),https://arxiv.org/abs/2410.21665

work page arXiv 2025
[12]

Chong, M.J., Forsyth, D.: Effectively unbiased fid and inception score and where to find them (2020),https://arxiv.org/abs/1911.07023

work page arXiv 2020
[13]

Nature Biomedical Engineering (2025) https://doi.org/ 10.1038/s41551-025-01468-8

Dar, S.U.H., Seyfarth, M., Ayx, I., Papavassiliu, T., Schoenberg, S.O., Siepmann, R.M., Laqua, F.C., Kahmann, J., Frey, N., Baeßler, B., Foersch, S., Truhn, D., Kather, J.N., Engelhardt, S.: Unconditional latent diffusion models memorize pa- tient imaging data. Nature Biomedical Engineering (2025).https://doi.org/10. 1038/s41551-025-01468-8,https://doi.or...

work page doi:10.1038/s41551-025-01468-8 2025
[14]

Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis (2021), https://arxiv.org/abs/2105.05233

work page internal anchor Pith review Pith/arXiv arXiv 2021
[15]

org/abs/2508.12148

Di, J.Z., Lu, Y., Yu, Y., Kamath, G., Dziedzic, A., Boenisch, F.: Demystifying foreground-background memorization in diffusion models (2025),https://arxiv. org/abs/2508.12148

work page arXiv 2025
[16]

arXiv preprint (2025),https: //arxiv.org/abs/2502.07516

Dutt, R.: The devil is in the prompts: De-identification traces enhance memo- rization risks in synthetic chest x-ray generation. arXiv preprint (2025),https: //arxiv.org/abs/2502.07516

work page arXiv 2025
[17]

arXiv preprint (2024),https://arxiv.org/abs/2405.19458

Dutt, R., Bohdal, O., Sanchez, P., Tsaftaris, S.A., Hospedales, T.: Memcontrol: Mitigating memorization in diffusion models via automated parameter selection. arXiv preprint (2024),https://arxiv.org/abs/2405.19458

work page arXiv 2024
[18]

Farghly, T., Potaptchik, P., Howard, S., Deligiannidis, G., Pidstrigach, J.: Diffusion models and the manifold hypothesis: Log-domain smoothing is geometry adaptive (2025),https://arxiv.org/abs/2510.02305

work page arXiv 2025
[19]

Farghly, T., Rebeschini, P., Deligiannidis, G., Doucet, A.: Implicit regularisation in diffusion models: An algorithm-dependent generalisation analysis (2025),https: //arxiv.org/abs/2507.03756

work page arXiv 2025
[20]

Fefferman, C., Mitter, S., Narayanan, H.: Testing the manifold hypothesis (2013), https://arxiv.org/abs/1310.0425

work page internal anchor Pith review Pith/arXiv arXiv 2013
[21]

Kaiser et al

Gao, W., Li, M.: How do flow matching models memorize and generalize in sample data subspaces? (2024),https://arxiv.org/abs/2410.23594 14 T. Kaiser et al

work page arXiv 2024
[22]

George, A.J., Veiga, R., Macris, N.: Denoising score matching with random fea- tures: Insights on diffusion models from precise learning curves (2025),https: //arxiv.org/abs/2502.00336

work page arXiv 2025
[23]

Gu, X., Du, C., Pang, T., Li, C., Lin, M., Wang, Y.: On memorization in diffusion models (2025),https://arxiv.org/abs/2310.02664

work page arXiv 2025
[24]

In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G

Gupta, A., Yu, L., Sohn, K., Gu, X., Hahn, M., Li, F.F., Essa, I., Jiang, L., Lezama, J.: Photorealistic video generation with diffusion models. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds.) Computer Vision – ECCV 2024. pp. 393–411. Springer Nature Switzerland, Cham (2025)

work page 2024
[25]

Halder, I.: A solvable generative model with a linear, one-step denoiser (2025), https://arxiv.org/abs/2411.17807

work page arXiv 2025
[26]

Heusel,M.,Ramsauer,H.,Unterthiner,T.,Nessler,B.,Hochreiter,S.:Ganstrained by a two time-scale update rule converge to a local nash equilibrium (2018),https: //arxiv.org/abs/1706.08500

work page internal anchor Pith review Pith/arXiv arXiv 2018
[27]

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models (2020)

work page 2020
[28]

Ho, J., Salimans, T.: Classifier-free diffusion guidance (2022),https://arxiv.org/ abs/2207.12598

work page internal anchor Pith review Pith/arXiv arXiv 2022
[29]

Ho, J., Salimans, T., Gritsenko, A., Chan, W., Norouzi, M., Fleet, D.J.: Video diffusion models (2022)

work page 2022
[30]

Hong,S.:Smoothedenergyguidance:Guidingdiffusionmodelswithreducedenergy curvature of attention (2024),https://arxiv.org/abs/2408.00760

work page arXiv 2024
[31]

Huang, K.P., wen Yang, S., Phan, H., Lu, B.R., Kim, B., Macha, S., Tang, Q., Ghosh, S., yi Lee, H., Kao, C.C., Wang, C.: Impact: Iterative mask-based parallel decoding for text-to-audio generation with diffusion modeling (2025),https:// arxiv.org/abs/2506.00736

work page arXiv 2025
[32]

Huang, R., Huang, J., Yang, D., Ren, Y., Liu, L., Li, M., Ye, Z., Liu, J., Yin, X., Zhao,Z.:Make-an-audio:Text-to-audiogenerationwithprompt-enhanceddiffusion models (2023),https://arxiv.org/abs/2301.12661

work page arXiv 2023
[33]

Jiralerspong, M., Bose, A.J., Gemp, I., Qin, C., Bachrach, Y., Gidel, G.: Feature likelihood divergence: Evaluating the generalization of generative models using samples (2024),https://arxiv.org/abs/2302.04440

work page arXiv 2024
[34]

Kadkhodaie, Z., Guth, F., Simoncelli, E.P., Mallat, S.: Generalization in diffusion models arises from geometry-adaptive harmonic representations (2024),https: //arxiv.org/abs/2310.02557

work page arXiv 2024
[35]

Kamb, M., Ganguli, S.: An analytic theory of creativity in convolutional diffusion models (2025),https://arxiv.org/abs/2412.20292

work page arXiv 2025
[36]

Karras,T.,Aittala,M.,Aila,T.,Laine,S.:Elucidatingthedesignspaceofdiffusion- based generative models (2022),https://arxiv.org/abs/2206.00364

work page internal anchor Pith review Pith/arXiv arXiv 2022
[37]

Karras,T.,Aittala,M.,Kynkäänniemi,T.,Lehtinen,J.,Aila,T.,Laine,S.:Guiding a diffusion model with a bad version of itself (2024),https://arxiv.org/abs/ 2406.02507

work page arXiv 2024
[38]

Karras, M

Karras, T., Aittala, M., Lehtinen, J., Hellsten, J., Aila, T., Laine, S.: Analyzing and improving the training dynamics of diffusion models (2024),https://arxiv. org/abs/2312.02696

work page arXiv 2024
[39]

Kingma, D.P., Salimans, T., Poole, B., Ho, J.: Variational diffusion models (2023)

work page 2023
[40]

Kong, Z., Ping, W., Huang, J., Zhao, K., Catanzaro, B.: Diffwave: A versatile diffusion model for audio synthesis (2021)

work page 2021
[41]

Kowalczuk, A., Hintersdorf, D., Struppek, L., Kersting, K., Dziedzic, A., Boenisch, F.:Findingdori:Memorizationintext-to-imagediffusionmodelsisnotlocal(2025), https://arxiv.org/abs/2507.16880 Diffusion Models Generalize but Not in the Way You Might Think 15

work page arXiv 2025
[42]

Liu, H., Chen, Z., Yuan, Y., Mei, X., Liu, X., Mandic, D., Wang, W., Plumbley, M.D.: Audioldm: Text-to-audio generation with latent diffusion models (2023), https://arxiv.org/abs/2301.12503

work page arXiv 2023
[43]

Lukoianov, A., Yuan, C., Solomon, J., Sitzmann, V.: Locality in image diffusion models emerges from data statistics (2025),https://arxiv.org/abs/2509.09672

work page arXiv 2025
[44]

Melnik, A., Ljubljanac, M., Lu, C., Yan, Q., Ren, W., Ritter, H.: Video diffusion models: A survey (2024),https://arxiv.org/abs/2405.03150

work page arXiv 2024
[45]

Ochs, S., Habernal, I.: Private synthetic text generation with diffusion models (2024),https://arxiv.org/abs/2410.22971

work page arXiv 2024
[46]

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W.,Howes,R.,Huang,P.Y.,Li,S.W.,Misra,I.,Rabbat,M.,Sharma,V.,Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: Dinov2: Learning robust visual features without su...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[47]

Popov,V.,Vovk,I.,Gogoryan,V.,Sadekova,T.,Kudinov,M.:Grad-tts:Adiffusion probabilistic model for text-to-speech (2021)

work page 2021
[48]

Ren, J., Li, Y., Zeng, S., Xu, H., Lyu, L., Xing, Y., Tang, J.: Unveiling and mitigat- ing memorization in text-to-image diffusion models through cross attention (2025), https://arxiv.org/abs/2403.11052

work page arXiv 2025
[49]

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models (2022),https://arxiv.org/abs/ 2112.10752

work page internal anchor Pith review Pith/arXiv arXiv 2022
[50]

Scarvelis, C., de Ocáriz Borde, H.S., Solomon, J.: Closed-form diffusion models (2025),https://arxiv.org/abs/2310.12395

work page arXiv 2025
[51]

org/abs/2502.21278

Shah, K., Kalavasis, A., Klivans, A.R., Daras, G.: Does generation require memo- rization? creative diffusion models using ambient diffusion (2025),https://arxiv. org/abs/2502.21278

work page arXiv 2025
[52]

Sohl-Dickstein, J., Weiss, E.A., Maheswaranathan, N., Ganguli, S.: Deep unsuper- vised learning using nonequilibrium thermodynamics (2015)

work page 2015
[53]

Somepalli, G., Singla, V., Goldblum, M., Geiping, J., Goldstein, T.: Diffusion art or digital forgery? investigating data replication in diffusion models (2022),https: //arxiv.org/abs/2212.03860

work page arXiv 2022
[54]

Somepalli, G., Singla, V., Goldblum, M., Geiping, J., Goldstein, T.: Understanding and mitigating copying in diffusion models (2023),https://arxiv.org/abs/2305. 20086

work page 2023
[55]

Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution (2020),https://arxiv.org/abs/1907.05600

work page internal anchor Pith review Pith/arXiv arXiv 2020
[56]

Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score- based generative modeling through stochastic differential equations (2021)

work page 2021
[57]

Stein, G., Cresswell, J.C., Hosseinzadeh, R., Sui, Y., Ross, B.L., Villecroze, V., Liu, Z., Caterini, A.L., Taylor, J.E.T., Loaiza-Ganem, G.: Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models (2023), https://arxiv.org/abs/2306.04675

work page arXiv 2023
[58]

Theis, L., van den Oord, A., Bethge, M.: A note on the evaluation of generative models (2016),https://arxiv.org/abs/1511.01844

work page internal anchor Pith review Pith/arXiv arXiv 2016
[59]

Kaiser et al

Ventura, E., Achilli, B., Silvestri, G., Lucibello, C., Ambrogioni, L.: Manifolds, random matrices and spectral gaps: The geometric phases of generative diffusion (2025),https://arxiv.org/abs/2410.05898 16 T. Kaiser et al

work page arXiv 2025
[60]

Wang, Y., Chen, X., Ma, X., Zhou, S., Huang, Z., Wang, Y., Yang, C., He, Y., Yu, J., Yang, P., Guo, Y., Wu, T., Si, C., Jiang, Y., Chen, C., Loy, C.C., Dai, B., Lin, D., Qiao, Y., Liu, Z.: Lavie: High-quality video generation with cascaded latent diffusion models (2023),https://arxiv.org/abs/2309.15103

work page arXiv 2023
[61]

Xu, M., Geffner, T., Kreis, K., Nie, W., Xu, Y., Leskovec, J., Ermon, S., Vahdat, A.: Energy-based diffusion language models for text generation (2025),https: //arxiv.org/abs/2410.21357

work page arXiv 2025
[62]

Ye, Z., Zhu, Q., Tao, M., Chen, M.: Provable separations between memorization and generalization in diffusion models (2025),https://arxiv.org/abs/2511. 03202

work page 2025
[63]

PeerJ Computer Science10, e1905 (2024).https://doi.org/ 10.7717/peerj-cs.1905,https://doi.org/10.7717/peerj-cs.1905

Yi, Q., Chen, X., Zhang, C., Zhou, Z., Zhu, L., Kong, X.: Diffusion models in text generation: a survey. PeerJ Computer Science10, e1905 (2024).https://doi.org/ 10.7717/peerj-cs.1905,https://doi.org/10.7717/peerj-cs.1905

work page doi:10.7717/peerj-cs.1905 2024
[64]

Yoon,T.,Choi,J.Y.,Kwon,S.,Ryu,E.K.:Diffusionprobabilisticmodelsgeneralize when they fail to memorize. In: ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling (2023),https://openreview.net/forum?id= shciCbSk9h Supplementary Material A Fréchet Distance (FD) with partial inference (early stopping) In our experiments, we found a sig...

work page 2023
[65]

We sampled 15 50k random subsets from the training split, with the same class prior as in the 50k validation split

work page
[66]

We used that class prior to generate 50k images per run

work page
[67]

Train" shows the average results between 20 different subsets of the training data

We computed the FD of the generated set against all available subsets of the training and validation split and averaged the results across subsets. On CIFAR-10/100, we follow the same protocol, but since the validation set is limited to 10k samples, we also limit the training subsets to 10k samples. We still use 50k generated images each time. The results...

work page

[1] [1]

Achilli, B., Ventura, E., Silvestri, G., Pham, B., Raya, G., Krotov, D., Lucibello, C., Ambrogioni,L.:Losingdimensions:Geometricmemorizationingenerativediffusion (2024),https://arxiv.org/abs/2410.08727

work page arXiv 2024

[2] [2]

Adaloglou, N., Kaiser, T., Iagudin, D., Kollmann, M.: Guiding a diffusion model using sliding windows (2025),https://arxiv.org/abs/2411.10257

work page arXiv 2025

[3] [3]

org/abs/2403.00570 Diffusion Models Generalize but Not in the Way You Might Think 13

Adaloglou, N., Kaiser, T., Michels, F., Kollmann, M.: Rethinking cluster- conditioned diffusion models for label-free image synthesis (2024),https://arxiv. org/abs/2403.00570 Diffusion Models Generalize but Not in the Way You Might Think 13

work page arXiv 2024

[4] [4]

Ahn, D., Cho, H., Min, J., Jang, W., Kim, J., Kim, S., Park, H.H., Jin, K.H., Kim, S.: Self-rectifying diffusion sampling with perturbed-attention guidance (2025), https://arxiv.org/abs/2403.17377

work page arXiv 2025

[5] [5]

Nature Communications15(1) (Nov 2024).https://doi.org/10.1038/ s41467-024-54281-3,http://dx.doi.org/10.1038/s41467-024-54281-3

Biroli, G., Bonnaire, T., de Bortoli, V., Mézard, M.: Dynamical regimes of diffusion models. Nature Communications15(1) (Nov 2024).https://doi.org/10.1038/ s41467-024-54281-3,http://dx.doi.org/10.1038/s41467-024-54281-3

work page doi:10.1038/s41467-024-54281-3 2024

[6] [6]

Blattmann, A., Dockhorn, T., Kulal, S., Mendelevitch, D., Kilian, M., Lorenz, D., Levi, Y., English, Z., Voleti, V., Letts, A., Jampani, V., Rombach, R.: Stable video diffusion: Scaling latent video diffusion models to large datasets (2023),https: //arxiv.org/abs/2311.15127

work page internal anchor Pith review Pith/arXiv arXiv 2023

[7] [7]

Bonnaire, T., Urfin, R., Biroli, G., Mézard, M.: Why diffusion models don’t mem- orize: The role of implicit dynamical regularization in training (2025),https: //arxiv.org/abs/2505.17638

work page arXiv 2025

[8] [8]

Buchanan, S., Pai, D., Ma, Y., Bortoli, V.D.: On the edge of memorization in diffusion models (2025),https://arxiv.org/abs/2508.17689

work page arXiv 2025

[9] [9]

van den Burg, G.J.J., Williams, C.K.I.: On memorization in probabilistic deep generative models (2021),https://arxiv.org/abs/2106.03216

work page arXiv 2021

[10] [10]

Carlini, N., Hayes, J., Nasr, M., Jagielski, M., Sehwag, V., Tramèr, F., Balle, B., Ippolito, D., Wallace, E.: Extracting training data from diffusion models (2023), https://arxiv.org/abs/2301.13188

work page arXiv 2023

[11] [11]

Chen, C., Liu, D., Shah, M., Xu, C.: Exploring local memorization in diffusion models via bright ending attention (2025),https://arxiv.org/abs/2410.21665

work page arXiv 2025

[12] [12]

Chong, M.J., Forsyth, D.: Effectively unbiased fid and inception score and where to find them (2020),https://arxiv.org/abs/1911.07023

work page arXiv 2020

[13] [13]

Nature Biomedical Engineering (2025) https://doi.org/ 10.1038/s41551-025-01468-8

Dar, S.U.H., Seyfarth, M., Ayx, I., Papavassiliu, T., Schoenberg, S.O., Siepmann, R.M., Laqua, F.C., Kahmann, J., Frey, N., Baeßler, B., Foersch, S., Truhn, D., Kather, J.N., Engelhardt, S.: Unconditional latent diffusion models memorize pa- tient imaging data. Nature Biomedical Engineering (2025).https://doi.org/10. 1038/s41551-025-01468-8,https://doi.or...

work page doi:10.1038/s41551-025-01468-8 2025

[14] [14]

Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis (2021), https://arxiv.org/abs/2105.05233

work page internal anchor Pith review Pith/arXiv arXiv 2021

[15] [15]

org/abs/2508.12148

Di, J.Z., Lu, Y., Yu, Y., Kamath, G., Dziedzic, A., Boenisch, F.: Demystifying foreground-background memorization in diffusion models (2025),https://arxiv. org/abs/2508.12148

work page arXiv 2025

[16] [16]

arXiv preprint (2025),https: //arxiv.org/abs/2502.07516

Dutt, R.: The devil is in the prompts: De-identification traces enhance memo- rization risks in synthetic chest x-ray generation. arXiv preprint (2025),https: //arxiv.org/abs/2502.07516

work page arXiv 2025

[17] [17]

arXiv preprint (2024),https://arxiv.org/abs/2405.19458

Dutt, R., Bohdal, O., Sanchez, P., Tsaftaris, S.A., Hospedales, T.: Memcontrol: Mitigating memorization in diffusion models via automated parameter selection. arXiv preprint (2024),https://arxiv.org/abs/2405.19458

work page arXiv 2024

[18] [18]

Farghly, T., Potaptchik, P., Howard, S., Deligiannidis, G., Pidstrigach, J.: Diffusion models and the manifold hypothesis: Log-domain smoothing is geometry adaptive (2025),https://arxiv.org/abs/2510.02305

work page arXiv 2025

[19] [19]

Farghly, T., Rebeschini, P., Deligiannidis, G., Doucet, A.: Implicit regularisation in diffusion models: An algorithm-dependent generalisation analysis (2025),https: //arxiv.org/abs/2507.03756

work page arXiv 2025

[20] [20]

Fefferman, C., Mitter, S., Narayanan, H.: Testing the manifold hypothesis (2013), https://arxiv.org/abs/1310.0425

work page internal anchor Pith review Pith/arXiv arXiv 2013

[21] [21]

Kaiser et al

Gao, W., Li, M.: How do flow matching models memorize and generalize in sample data subspaces? (2024),https://arxiv.org/abs/2410.23594 14 T. Kaiser et al

work page arXiv 2024

[22] [22]

George, A.J., Veiga, R., Macris, N.: Denoising score matching with random fea- tures: Insights on diffusion models from precise learning curves (2025),https: //arxiv.org/abs/2502.00336

work page arXiv 2025

[23] [23]

Gu, X., Du, C., Pang, T., Li, C., Lin, M., Wang, Y.: On memorization in diffusion models (2025),https://arxiv.org/abs/2310.02664

work page arXiv 2025

[24] [24]

In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G

Gupta, A., Yu, L., Sohn, K., Gu, X., Hahn, M., Li, F.F., Essa, I., Jiang, L., Lezama, J.: Photorealistic video generation with diffusion models. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds.) Computer Vision – ECCV 2024. pp. 393–411. Springer Nature Switzerland, Cham (2025)

work page 2024

[25] [25]

Halder, I.: A solvable generative model with a linear, one-step denoiser (2025), https://arxiv.org/abs/2411.17807

work page arXiv 2025

[26] [26]

Heusel,M.,Ramsauer,H.,Unterthiner,T.,Nessler,B.,Hochreiter,S.:Ganstrained by a two time-scale update rule converge to a local nash equilibrium (2018),https: //arxiv.org/abs/1706.08500

work page internal anchor Pith review Pith/arXiv arXiv 2018

[27] [27]

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models (2020)

work page 2020

[28] [28]

Ho, J., Salimans, T.: Classifier-free diffusion guidance (2022),https://arxiv.org/ abs/2207.12598

work page internal anchor Pith review Pith/arXiv arXiv 2022

[29] [29]

Ho, J., Salimans, T., Gritsenko, A., Chan, W., Norouzi, M., Fleet, D.J.: Video diffusion models (2022)

work page 2022

[30] [30]

Hong,S.:Smoothedenergyguidance:Guidingdiffusionmodelswithreducedenergy curvature of attention (2024),https://arxiv.org/abs/2408.00760

work page arXiv 2024

[31] [31]

Huang, K.P., wen Yang, S., Phan, H., Lu, B.R., Kim, B., Macha, S., Tang, Q., Ghosh, S., yi Lee, H., Kao, C.C., Wang, C.: Impact: Iterative mask-based parallel decoding for text-to-audio generation with diffusion modeling (2025),https:// arxiv.org/abs/2506.00736

work page arXiv 2025

[32] [32]

Huang, R., Huang, J., Yang, D., Ren, Y., Liu, L., Li, M., Ye, Z., Liu, J., Yin, X., Zhao,Z.:Make-an-audio:Text-to-audiogenerationwithprompt-enhanceddiffusion models (2023),https://arxiv.org/abs/2301.12661

work page arXiv 2023

[33] [33]

Jiralerspong, M., Bose, A.J., Gemp, I., Qin, C., Bachrach, Y., Gidel, G.: Feature likelihood divergence: Evaluating the generalization of generative models using samples (2024),https://arxiv.org/abs/2302.04440

work page arXiv 2024

[34] [34]

Kadkhodaie, Z., Guth, F., Simoncelli, E.P., Mallat, S.: Generalization in diffusion models arises from geometry-adaptive harmonic representations (2024),https: //arxiv.org/abs/2310.02557

work page arXiv 2024

[35] [35]

Kamb, M., Ganguli, S.: An analytic theory of creativity in convolutional diffusion models (2025),https://arxiv.org/abs/2412.20292

work page arXiv 2025

[36] [36]

Karras,T.,Aittala,M.,Aila,T.,Laine,S.:Elucidatingthedesignspaceofdiffusion- based generative models (2022),https://arxiv.org/abs/2206.00364

work page internal anchor Pith review Pith/arXiv arXiv 2022

[37] [37]

Karras,T.,Aittala,M.,Kynkäänniemi,T.,Lehtinen,J.,Aila,T.,Laine,S.:Guiding a diffusion model with a bad version of itself (2024),https://arxiv.org/abs/ 2406.02507

work page arXiv 2024

[38] [38]

Karras, M

Karras, T., Aittala, M., Lehtinen, J., Hellsten, J., Aila, T., Laine, S.: Analyzing and improving the training dynamics of diffusion models (2024),https://arxiv. org/abs/2312.02696

work page arXiv 2024

[39] [39]

Kingma, D.P., Salimans, T., Poole, B., Ho, J.: Variational diffusion models (2023)

work page 2023

[40] [40]

Kong, Z., Ping, W., Huang, J., Zhao, K., Catanzaro, B.: Diffwave: A versatile diffusion model for audio synthesis (2021)

work page 2021

[41] [41]

Kowalczuk, A., Hintersdorf, D., Struppek, L., Kersting, K., Dziedzic, A., Boenisch, F.:Findingdori:Memorizationintext-to-imagediffusionmodelsisnotlocal(2025), https://arxiv.org/abs/2507.16880 Diffusion Models Generalize but Not in the Way You Might Think 15

work page arXiv 2025

[42] [42]

Liu, H., Chen, Z., Yuan, Y., Mei, X., Liu, X., Mandic, D., Wang, W., Plumbley, M.D.: Audioldm: Text-to-audio generation with latent diffusion models (2023), https://arxiv.org/abs/2301.12503

work page arXiv 2023

[43] [43]

Lukoianov, A., Yuan, C., Solomon, J., Sitzmann, V.: Locality in image diffusion models emerges from data statistics (2025),https://arxiv.org/abs/2509.09672

work page arXiv 2025

[44] [44]

Melnik, A., Ljubljanac, M., Lu, C., Yan, Q., Ren, W., Ritter, H.: Video diffusion models: A survey (2024),https://arxiv.org/abs/2405.03150

work page arXiv 2024

[45] [45]

Ochs, S., Habernal, I.: Private synthetic text generation with diffusion models (2024),https://arxiv.org/abs/2410.22971

work page arXiv 2024

[46] [46]

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W.,Howes,R.,Huang,P.Y.,Li,S.W.,Misra,I.,Rabbat,M.,Sharma,V.,Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: Dinov2: Learning robust visual features without su...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[47] [47]

Popov,V.,Vovk,I.,Gogoryan,V.,Sadekova,T.,Kudinov,M.:Grad-tts:Adiffusion probabilistic model for text-to-speech (2021)

work page 2021

[48] [48]

Ren, J., Li, Y., Zeng, S., Xu, H., Lyu, L., Xing, Y., Tang, J.: Unveiling and mitigat- ing memorization in text-to-image diffusion models through cross attention (2025), https://arxiv.org/abs/2403.11052

work page arXiv 2025

[49] [49]

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models (2022),https://arxiv.org/abs/ 2112.10752

work page internal anchor Pith review Pith/arXiv arXiv 2022

[50] [50]

Scarvelis, C., de Ocáriz Borde, H.S., Solomon, J.: Closed-form diffusion models (2025),https://arxiv.org/abs/2310.12395

work page arXiv 2025

[51] [51]

org/abs/2502.21278

Shah, K., Kalavasis, A., Klivans, A.R., Daras, G.: Does generation require memo- rization? creative diffusion models using ambient diffusion (2025),https://arxiv. org/abs/2502.21278

work page arXiv 2025

[52] [52]

Sohl-Dickstein, J., Weiss, E.A., Maheswaranathan, N., Ganguli, S.: Deep unsuper- vised learning using nonequilibrium thermodynamics (2015)

work page 2015

[53] [53]

Somepalli, G., Singla, V., Goldblum, M., Geiping, J., Goldstein, T.: Diffusion art or digital forgery? investigating data replication in diffusion models (2022),https: //arxiv.org/abs/2212.03860

work page arXiv 2022

[54] [54]

Somepalli, G., Singla, V., Goldblum, M., Geiping, J., Goldstein, T.: Understanding and mitigating copying in diffusion models (2023),https://arxiv.org/abs/2305. 20086

work page 2023

[55] [55]

Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution (2020),https://arxiv.org/abs/1907.05600

work page internal anchor Pith review Pith/arXiv arXiv 2020

[56] [56]

Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score- based generative modeling through stochastic differential equations (2021)

work page 2021

[57] [57]

Stein, G., Cresswell, J.C., Hosseinzadeh, R., Sui, Y., Ross, B.L., Villecroze, V., Liu, Z., Caterini, A.L., Taylor, J.E.T., Loaiza-Ganem, G.: Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models (2023), https://arxiv.org/abs/2306.04675

work page arXiv 2023

[58] [58]

Theis, L., van den Oord, A., Bethge, M.: A note on the evaluation of generative models (2016),https://arxiv.org/abs/1511.01844

work page internal anchor Pith review Pith/arXiv arXiv 2016

[59] [59]

Kaiser et al

Ventura, E., Achilli, B., Silvestri, G., Lucibello, C., Ambrogioni, L.: Manifolds, random matrices and spectral gaps: The geometric phases of generative diffusion (2025),https://arxiv.org/abs/2410.05898 16 T. Kaiser et al

work page arXiv 2025

[60] [60]

Wang, Y., Chen, X., Ma, X., Zhou, S., Huang, Z., Wang, Y., Yang, C., He, Y., Yu, J., Yang, P., Guo, Y., Wu, T., Si, C., Jiang, Y., Chen, C., Loy, C.C., Dai, B., Lin, D., Qiao, Y., Liu, Z.: Lavie: High-quality video generation with cascaded latent diffusion models (2023),https://arxiv.org/abs/2309.15103

work page arXiv 2023

[61] [61]

Xu, M., Geffner, T., Kreis, K., Nie, W., Xu, Y., Leskovec, J., Ermon, S., Vahdat, A.: Energy-based diffusion language models for text generation (2025),https: //arxiv.org/abs/2410.21357

work page arXiv 2025

[62] [62]

Ye, Z., Zhu, Q., Tao, M., Chen, M.: Provable separations between memorization and generalization in diffusion models (2025),https://arxiv.org/abs/2511. 03202

work page 2025

[63] [63]

PeerJ Computer Science10, e1905 (2024).https://doi.org/ 10.7717/peerj-cs.1905,https://doi.org/10.7717/peerj-cs.1905

Yi, Q., Chen, X., Zhang, C., Zhou, Z., Zhu, L., Kong, X.: Diffusion models in text generation: a survey. PeerJ Computer Science10, e1905 (2024).https://doi.org/ 10.7717/peerj-cs.1905,https://doi.org/10.7717/peerj-cs.1905

work page doi:10.7717/peerj-cs.1905 2024

[64] [64]

Yoon,T.,Choi,J.Y.,Kwon,S.,Ryu,E.K.:Diffusionprobabilisticmodelsgeneralize when they fail to memorize. In: ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling (2023),https://openreview.net/forum?id= shciCbSk9h Supplementary Material A Fréchet Distance (FD) with partial inference (early stopping) In our experiments, we found a sig...

work page 2023

[65] [65]

We sampled 15 50k random subsets from the training split, with the same class prior as in the 50k validation split

work page

[66] [66]

We used that class prior to generate 50k images per run

work page

[67] [67]

Train" shows the average results between 20 different subsets of the training data

We computed the FD of the generated set against all available subsets of the training and validation split and averaged the results across subsets. On CIFAR-10/100, we follow the same protocol, but since the validation set is limited to 10k samples, we also limit the training subsets to 10k samples. We still use 50k generated images each time. The results...

work page