{"work":{"id":"dc023f4e-7c79-471c-b713-deeb559ba010","openalex_id":null,"doi":null,"arxiv_id":"2006.11239","raw_key":null,"title":"Denoising Diffusion Probabilistic Models","authors":null,"authors_text":"Jonathan Ho, Ajay Jain, Pieter Abbeel","year":2020,"venue":"cs.LG","abstract":"We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. Our implementation is available at https://github.com/hojonathanho/diffusion","external_url":"https://arxiv.org/abs/2006.11239","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-25T06:00:23.344940+00:00","pith_arxiv_id":"2006.11239","created_at":"2026-05-09T00:09:29.451348+00:00","updated_at":"2026-06-05T21:23:00.469572+00:00","title_quality_ok":true,"display_title":"Denoising Diffusion Probabilistic Models","render_title":"Denoising Diffusion Probabilistic Models"},"hub":{"state":{"work_id":"dc023f4e-7c79-471c-b713-deeb559ba010","tier":"super_hub","tier_reason":"100+ Pith inbound or 10,000+ external citations","pith_inbound_count":135,"external_cited_by_count":null,"distinct_field_count":28,"first_pith_cited_at":"2020-09-21T11:20:38+00:00","last_pith_cited_at":"2026-05-22T12:20:46+00:00","author_build_status":"needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-12T07:29:14.160101+00:00","tier_text":"super_hub"},"tier":"super_hub","role_counts":[{"context_role":"background","n":18},{"context_role":"method","n":9},{"context_role":"baseline","n":3},{"context_role":"other","n":1}],"polarity_counts":[{"context_polarity":"background","n":17},{"context_polarity":"use_method","n":9},{"context_polarity":"baseline","n":3},{"context_polarity":"unclear","n":2}],"runs":{"ask_index":{"job_type":"ask_index","status":"succeeded","result":{"title":"Denoising Diffusion Probabilistic Models","claims":[{"claim_text":"We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"demonstrate thatZ 2-Sampling structurally shatters the performance-efficiency Pareto frontier. We validate its universal applicability across diverse architectures (U-Nets, DiTs) and modalities (image/video), establishing seamless orthogonality with advanced alignment frameworks (AYS, Diffusion-DPO). 1 Introduction Diffusion models [3-5, 13, 29, 52] have redefined the landscape of text-to-image [10, 20, 22, 31, 33, 41, 42] and text-to-video [2, 11, 23, 40] generation. The cornerstone of this suc","claim_type":"background","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Diffusion models are a class of likelihood-based models which have recently been shown to produce high-quality images [56, 59, 25] while offering desirable properties such as distribution coverage, a stationary training objective, and easy scalability. These models generate samples by gradually removing noise from a signal, and their training objective can be expressed as a reweighted variational lower-bound [25]. This class of models already holds the state-of-the-art [60] on CIFAR-10 [31], but","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"We survey continuous-time generative modeling methods based on transporting a sim- ple reference distribution to a data distribution via stochastic or deterministic dynamics. We present a unified framework in which diffusion models, score-based generative models, and flow matching are instances of learning a time-dependent vector field that induces a family of marginals (ρ t)t∈[0,1] governed by a continuity/Fokker-Planck equation. Within this framework, we (i) derive reverse-time sampling for di","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"FVD is fully parallelizable and scales efficiently with inference compute. Empirically, it achieves substantial gains across settings: on DrawBench it outperforms prior methods by 7% in ImageReward, while on class- conditional tasks it improves FID by roughly 14-20% over strong baselines and is up to 66×faster than value-based approaches. 1 Introduction Diffusion models [1, 2] have become a dominant paradigm for generative modelling, achieving state- of-the-art performance across modalities incl","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"new candidates that extend far beyond existing datasets [20]. These problems share a fundamental challenge: the un- derlying design spaces grow combinatorially, while available training data remain sparse [50]. Diffusion-based generative models address this challenge by introducing a forward process that progressively randomizes data and a reverse process that reconstructs structure from noise [22]. This forward-reverse formulation enables controlled sampling of complex distributions and produce","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Specifically, face forgery detection refers to determining the authenticity of facial images at either the image level or the video level, and localization intends to achieve pixel-level predictions for facial images further. Photorealistic generative approaches, including generative adversarial networks (GANs) [7], [8], [9] and denoising diffusion probabilistic models (DDPM) [10], have attained extraordinary advancement in generating extremely lifelike facial images, such that human fails to di","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Denoising Diffusion Probabilistic Models because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (13 contexts).","role_counts":[{"n":13,"context_role":"background"},{"n":7,"context_role":"method"},{"n":3,"context_role":"baseline"}]},"error":null,"updated_at":"2026-05-19T04:31:26.223187+00:00"},"author_expand":{"job_type":"author_expand","status":"succeeded","result":{"authors_linked":[{"id":"c91ab32b-1b57-42d9-ae75-38fd5dc6cd26","orcid":null,"display_name":"Jonathan Ho"},{"id":"95ac7a6c-c4de-4013-9786-7e5f1850c58e","orcid":null,"display_name":"Ajay Jain"},{"id":"0f251250-6471-4035-9a96-4e2bd1ce0707","orcid":null,"display_name":"Pieter Abbeel"}]},"error":null,"updated_at":"2026-05-19T04:31:26.217032+00:00"},"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T10:59:30.474883+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"Denoising Diffusion Implicit Models","work_id":"8fa2128b-d18c-405c-ac92-0e669cf89ac0","shared_citers":20},{"title":"Score-Based Generative Modeling through Stochastic Differential Equations","work_id":"d9110e53-a5d4-4794-a4c5-a575e91c31ad","shared_citers":18},{"title":"High-Resolution Image Synthesis with Latent Diffusion Models","work_id":"f0270d36-2952-47fb-84c1-95e3ec341126","shared_citers":14},{"title":"Flow Matching for Generative Modeling","work_id":"6edb71c4-5d64-40af-a394-9757ea051a36","shared_citers":12},{"title":"Deep Unsupervised Learning using Nonequilibrium Thermodynamics","work_id":"986277c3-5997-4593-942c-17cdec737a72","shared_citers":10},{"title":"Diffusion Models Beat GANs on Image Synthesis","work_id":"2eb944bb-93ba-462c-8111-4e8c915dd873","shared_citers":10},{"title":"Classifier-Free Diffusion Guidance","work_id":"acf2c588-c088-4a6c-938e-150ad7c666d7","shared_citers":9},{"title":"Improved denoising diffusion probabilistic models","work_id":"8604698b-f96a-4997-bff6-b6ed40d29744","shared_citers":9},{"title":"Karras, M","work_id":"a80a774d-caed-4e4c-9b69-471be05076e6","shared_citers":9},{"title":"Attention Is All You Need","work_id":"baafb5a2-5272-43bc-932f-09fa9ffe5316","shared_citers":8},{"title":"Learning Transferable Visual Models From Natural Language Supervision","work_id":"6de86bb5-27bd-4d5c-8b89-967ebfc52659","shared_citers":8},{"title":"Auto-Encoding Variational Bayes","work_id":"97d95295-30e1-42b4-bbf6-85f0fa4edb44","shared_citers":7},{"title":"SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis","work_id":"8034c587-fba6-4941-87ba-c98f2ac962cb","shared_citers":7},{"title":"CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer","work_id":"f38fc088-12aa-4bf4-9ecd-08d3e797ccb7","shared_citers":6},{"title":"Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow","work_id":"a1989e1b-d66d-4533-be3a-fb9c5fd62290","shared_citers":6},{"title":"Consistency Models","work_id":"502bf494-8fcd-434f-828f-0566ab606719","shared_citers":5},{"title":"Generative Adversarial Networks","work_id":"ad1c2a45-7ac7-45e3-9ffa-c83ca5f20ab9","shared_citers":5},{"title":"Generative modeling by estimating gradients of the data distribution","work_id":"8ad170e5-27c5-474b-aae5-bbbdcf73b90a","shared_citers":5},{"title":"GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models","work_id":"34430d19-7919-48ce-88a5-17b3bfe2192e","shared_citers":5},{"title":"Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding","work_id":"af16442b-a46f-469d-8818-c37b53a504c7","shared_citers":5},{"title":"U-Net: Convolutional Networks for Biomedical Image Segmentation","work_id":"5c6b13d6-e704-4bf4-9df7-3a3a4d3b6950","shared_citers":5},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":4},{"title":"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale","work_id":"e96730e3-129b-4db6-b981-15ab7932e297","shared_citers":4},{"title":"Heusel, H","work_id":"fe24e68f-7467-4ea6-a130-10e1464b876e","shared_citers":4}],"time_series":[{"n":3,"year":2021},{"n":2,"year":2022},{"n":1,"year":2023},{"n":1,"year":2024},{"n":54,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T10:59:32.440157+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T10:59:38.947869+00:00"},"role_polarity":{"job_type":"role_polarity","status":"succeeded","result":{"title":"Denoising Diffusion Probabilistic Models","claims":[{"claim_text":"We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"demonstrate thatZ 2-Sampling structurally shatters the performance-efficiency Pareto frontier. We validate its universal applicability across diverse architectures (U-Nets, DiTs) and modalities (image/video), establishing seamless orthogonality with advanced alignment frameworks (AYS, Diffusion-DPO). 1 Introduction Diffusion models [3-5, 13, 29, 52] have redefined the landscape of text-to-image [10, 20, 22, 31, 33, 41, 42] and text-to-video [2, 11, 23, 40] generation. The cornerstone of this suc","claim_type":"background","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Diffusion models are a class of likelihood-based models which have recently been shown to produce high-quality images [56, 59, 25] while offering desirable properties such as distribution coverage, a stationary training objective, and easy scalability. These models generate samples by gradually removing noise from a signal, and their training objective can be expressed as a reweighted variational lower-bound [25]. This class of models already holds the state-of-the-art [60] on CIFAR-10 [31], but","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"We survey continuous-time generative modeling methods based on transporting a sim- ple reference distribution to a data distribution via stochastic or deterministic dynamics. We present a unified framework in which diffusion models, score-based generative models, and flow matching are instances of learning a time-dependent vector field that induces a family of marginals (ρ t)t∈[0,1] governed by a continuity/Fokker-Planck equation. Within this framework, we (i) derive reverse-time sampling for di","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"FVD is fully parallelizable and scales efficiently with inference compute. Empirically, it achieves substantial gains across settings: on DrawBench it outperforms prior methods by 7% in ImageReward, while on class- conditional tasks it improves FID by roughly 14-20% over strong baselines and is up to 66×faster than value-based approaches. 1 Introduction Diffusion models [1, 2] have become a dominant paradigm for generative modelling, achieving state- of-the-art performance across modalities incl","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"new candidates that extend far beyond existing datasets [20]. These problems share a fundamental challenge: the un- derlying design spaces grow combinatorially, while available training data remain sparse [50]. Diffusion-based generative models address this challenge by introducing a forward process that progressively randomizes data and a reverse process that reconstructs structure from noise [22]. This forward-reverse formulation enables controlled sampling of complex distributions and produce","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Specifically, face forgery detection refers to determining the authenticity of facial images at either the image level or the video level, and localization intends to achieve pixel-level predictions for facial images further. Photorealistic generative approaches, including generative adversarial networks (GANs) [7], [8], [9] and denoising diffusion probabilistic models (DDPM) [10], have attained extraordinary advancement in generating extremely lifelike facial images, such that human fails to di","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Denoising Diffusion Probabilistic Models because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (13 contexts).","role_counts":[{"n":13,"context_role":"background"},{"n":7,"context_role":"method"},{"n":3,"context_role":"baseline"}]},"error":null,"updated_at":"2026-05-19T04:31:25.794086+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"Denoising Diffusion Probabilistic Models","claims":[{"claim_text":"We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Denoising Diffusion Probabilistic Models because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T10:59:28.528909+00:00"}},"summary":{"title":"Denoising Diffusion Probabilistic Models","claims":[{"claim_text":"We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Denoising Diffusion Probabilistic Models because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"Denoising Diffusion Implicit Models","work_id":"8fa2128b-d18c-405c-ac92-0e669cf89ac0","shared_citers":20},{"title":"Score-Based Generative Modeling through Stochastic Differential Equations","work_id":"d9110e53-a5d4-4794-a4c5-a575e91c31ad","shared_citers":18},{"title":"High-Resolution Image Synthesis with Latent Diffusion Models","work_id":"f0270d36-2952-47fb-84c1-95e3ec341126","shared_citers":14},{"title":"Flow Matching for Generative Modeling","work_id":"6edb71c4-5d64-40af-a394-9757ea051a36","shared_citers":12},{"title":"Deep Unsupervised Learning using Nonequilibrium Thermodynamics","work_id":"986277c3-5997-4593-942c-17cdec737a72","shared_citers":10},{"title":"Diffusion Models Beat GANs on Image Synthesis","work_id":"2eb944bb-93ba-462c-8111-4e8c915dd873","shared_citers":10},{"title":"Classifier-Free Diffusion Guidance","work_id":"acf2c588-c088-4a6c-938e-150ad7c666d7","shared_citers":9},{"title":"Improved denoising diffusion probabilistic models","work_id":"8604698b-f96a-4997-bff6-b6ed40d29744","shared_citers":9},{"title":"Karras, M","work_id":"a80a774d-caed-4e4c-9b69-471be05076e6","shared_citers":9},{"title":"Attention Is All You Need","work_id":"baafb5a2-5272-43bc-932f-09fa9ffe5316","shared_citers":8},{"title":"Learning Transferable Visual Models From Natural Language Supervision","work_id":"6de86bb5-27bd-4d5c-8b89-967ebfc52659","shared_citers":8},{"title":"Auto-Encoding Variational Bayes","work_id":"97d95295-30e1-42b4-bbf6-85f0fa4edb44","shared_citers":7},{"title":"SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis","work_id":"8034c587-fba6-4941-87ba-c98f2ac962cb","shared_citers":7},{"title":"CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer","work_id":"f38fc088-12aa-4bf4-9ecd-08d3e797ccb7","shared_citers":6},{"title":"Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow","work_id":"a1989e1b-d66d-4533-be3a-fb9c5fd62290","shared_citers":6},{"title":"Consistency Models","work_id":"502bf494-8fcd-434f-828f-0566ab606719","shared_citers":5},{"title":"Generative Adversarial Networks","work_id":"ad1c2a45-7ac7-45e3-9ffa-c83ca5f20ab9","shared_citers":5},{"title":"Generative modeling by estimating gradients of the data distribution","work_id":"8ad170e5-27c5-474b-aae5-bbbdcf73b90a","shared_citers":5},{"title":"GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models","work_id":"34430d19-7919-48ce-88a5-17b3bfe2192e","shared_citers":5},{"title":"Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding","work_id":"af16442b-a46f-469d-8818-c37b53a504c7","shared_citers":5},{"title":"U-Net: Convolutional Networks for Biomedical Image Segmentation","work_id":"5c6b13d6-e704-4bf4-9df7-3a3a4d3b6950","shared_citers":5},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":4},{"title":"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale","work_id":"e96730e3-129b-4db6-b981-15ab7932e297","shared_citers":4},{"title":"Heusel, H","work_id":"fe24e68f-7467-4ea6-a130-10e1464b876e","shared_citers":4}],"time_series":[{"n":3,"year":2021},{"n":2,"year":2022},{"n":1,"year":2023},{"n":1,"year":2024},{"n":54,"year":2026}],"dependency_candidates":[]},"authors":[{"id":"95ac7a6c-c4de-4013-9786-7e5f1850c58e","orcid":null,"display_name":"Ajay Jain","source":"manual","import_confidence":0.72},{"id":"c91ab32b-1b57-42d9-ae75-38fd5dc6cd26","orcid":null,"display_name":"Jonathan Ho","source":"manual","import_confidence":0.72},{"id":"0f251250-6471-4035-9a96-4e2bd1ce0707","orcid":null,"display_name":"Pieter Abbeel","source":"manual","import_confidence":0.72}]}}