{"paper":{"title":"Generalizing Score-based generative models for Heavy-tailed Distributions","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Early stopping plus normalizing flow initialization extends score-based models to any heavy-tailed target with KL convergence.","cross_cats":["cs.LG"],"primary_cat":"stat.ML","authors_text":"Gabriel Cardoso, Sylvan Le Corff, Thomas Romary, Tiziano Fassina","submitted_at":"2026-02-28T18:37:10Z","abstract_excerpt":"Score-based generative models (SGMs) have achieved remarkable empirical success, motivating their application to a broad range of data distributions. However, extending them to heavy-tailed targets remains a largely open problem. Although dedicated models for heavy-tailed distributions have been proposed, their generative fidelity remains unclear and they lack solid theoretical foundations, leaving important questions open in this regime. In this paper, we address this gap through two theoretical contributions. First, we show that combining early stopping with a suitable initialization is suff"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Combining early stopping with a suitable initialization is sufficient to extend the diffusion framework to any target distribution; we establish the well-posedness of the backward process and prove convergence of the approximated diffusion in KL divergence. Novel theoretical guarantees for generation with normalizing flows hold under mild conditions on the flow family and without any assumption on the tail behavior of the target distribution.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The normalizing flow must be expressive enough to capture the tail behavior of the target distribution so that it provides a useful initialization prior for the subsequent SGM refinement step.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Early stopping plus normalizing flow initialization extends diffusion models to any target distribution with proven KL convergence, and a hybrid flow-SGM pipeline captures heavy tails without tail-specific assumptions.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Early stopping plus normalizing flow initialization extends score-based models to any heavy-tailed target with KL convergence.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"312a19225869b1ebe586b36614bc25012ef2acd28afa3df322d17043fb8c5fbf"},"source":{"id":"2603.00772","kind":"arxiv","version":2},"verdict":{"id":"cc20a63f-5c56-415e-81ce-904d06e46b16","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T18:06:17.700943Z","strongest_claim":"Combining early stopping with a suitable initialization is sufficient to extend the diffusion framework to any target distribution; we establish the well-posedness of the backward process and prove convergence of the approximated diffusion in KL divergence. Novel theoretical guarantees for generation with normalizing flows hold under mild conditions on the flow family and without any assumption on the tail behavior of the target distribution.","one_line_summary":"Early stopping plus normalizing flow initialization extends diffusion models to any target distribution with proven KL convergence, and a hybrid flow-SGM pipeline captures heavy tails without tail-specific assumptions.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The normalizing flow must be expressive enough to capture the tail behavior of the target distribution so that it provides a useful initialization prior for the subsequent SGM refinement step.","pith_extraction_headline":"Early stopping plus normalizing flow initialization extends score-based models to any heavy-tailed target with KL convergence."},"references":{"count":15,"sample":[{"doi":"","year":2022,"title":"Imagen 3.arXiv preprint arXiv:2408.07009, 2024","work_id":"a1dd317f-8300-4a79-a1d0-92ddd93fa983","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","ref_index":2,"cited_arxiv_id":"1412.6980","is_internal_anchor":true},{"doi":"","year":2003,"title":"dµX0(x0) = Z CT ×Rd Z T 0 ∥At(x)−a t(x)∥2 2b 2 t dtdµ X,X0(x, x0) = Z CT Z T 0 ∥At(x)−a t(x)∥2 2b 2 t dtdµ X(x) =E \" 1 2 Z T 0 1 b2s ∥As(X)−a s(X)∥2 ds # ,(22) which concludes the proof. A.2. Lemmas r","work_id":"8b6f7eb7-5715-4c36-a298-d5f10214ac6d","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Using the general property ∆f(x) f(x) = ∆f(x) +∥∇f(x)∥ 2 , we can write ∂t log ˜pT−t (x)ρ(x) + ¯αt h ρ(x)∇log ˜pT−t (x)·x+∇ρ(x)·x+d ρ(x) i + ¯g2 t 2 h ρ(x) [∆ log ˜pT−t (x) +∥∇log ˜pT−t (x)∥2] + ∆ρ(x)","work_id":"cedf64fa-d469-4e22-b569-ca13e68563f7","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"To obtain Equation (12), using the definition ofY t write N−1X k=0 Z tk+1 tk ¯g2 t E ∥∇log⃗ pT−t k(← −X tk)− ∇log⃗ pT−t (← −X t)∥2 dt= N−1X k=0 Z tk+1 tk ¯g2 t E ∥Ytk −Y t∥2 dt","work_id":"24865335-ab83-4f1e-a8aa-989b7b615559","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":15,"snapshot_sha256":"50d551c03253168d8a56a6561c5881ad08696e5b28346c560d51c2bfa1de8aee","internal_anchors":1},"formal_canon":{"evidence_count":2,"snapshot_sha256":"a2fcb4fadf71e1b1a3cc3513288b9af0f49f9aaf6207274d3a79fd6f17022f3d"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}