{"paper":{"title":"One Step Diffusion via Shortcut Models","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Shortcut models generate high-quality diffusion samples in one step using a single network.","cross_cats":["cs.CV"],"primary_cat":"cs.LG","authors_text":"Danijar Hafner, Kevin Frans, Pieter Abbeel, Sergey Levine","submitted_at":"2024-10-16T13:34:40Z","abstract_excerpt":"Diffusion models and flow-matching models have enabled generating diverse and realistic images by learning to transfer noise to data. However, sampling from these models involves iterative denoising over many neural network passes, making generation slow and expensive. Previous approaches for speeding up sampling require complex training regimes, such as multiple training phases, multiple networks, or fragile scheduling. We introduce shortcut models, a family of generative models that use a single network and training phase to produce high-quality samples in a single or multiple sampling steps"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Shortcut models consistently produce higher quality samples than previous approaches, such as consistency models and reflow. Compared to distillation, shortcut models reduce complexity to a single network and training phase and additionally allow varying step budgets at inference time.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That a single network can learn effective large-step transitions across a wide range of step sizes during one training phase without quality degradation or the need for fragile scheduling.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Shortcut models enable high-quality single or few-step sampling in diffusion models with one network and training phase by conditioning on desired step size.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Shortcut models generate high-quality diffusion samples in one step using a single network.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"18dd1c5ee98a82f54a2216d58e3b78d3de7d6fd6f1f4e210122fde23b2cc9bde"},"source":{"id":"2410.12557","kind":"arxiv","version":3},"verdict":{"id":"c649fac0-11fc-4937-bf6d-a784c5644915","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T06:36:06.295454Z","strongest_claim":"Shortcut models consistently produce higher quality samples than previous approaches, such as consistency models and reflow. Compared to distillation, shortcut models reduce complexity to a single network and training phase and additionally allow varying step budgets at inference time.","one_line_summary":"Shortcut models enable high-quality single or few-step sampling in diffusion models with one network and training phase by conditioning on desired step size.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That a single network can learn effective large-step transitions across a wide range of step sizes during one training phase without quality degradation or the need for fragile scheduling.","pith_extraction_headline":"Shortcut models generate high-quality diffusion samples in one step using a single network."},"references":{"count":28,"sample":[{"doi":"","year":null,"title":"Lumiere: A space-time diffusion model for video generation","work_id":"8a0a0735-d82c-4090-a039-697d06ccc3f0","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Tract: Denoising diffusion models with transitive closure time-distillation","work_id":"d884704a-6249-4934-a29c-436da79e969e","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"arXiv preprint arXiv:2406.07507 (2024) 5","work_id":"3b39f5ec-7294-49ef-ac0e-d95360c0f177","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Large Scale GAN Training for High Fidelity Natural Image Synthesis","work_id":"244e6f06-bad2-4f34-8186-ff370286427f","ref_index":4,"cited_arxiv_id":"1809.11096","is_internal_anchor":true},{"doi":"","year":null,"title":"Diffusion Policy: Visuomotor Policy Learning via Action Diffusion","work_id":"2dce18e6-f07a-4f57-8a81-e71c3e6a293c","ref_index":5,"cited_arxiv_id":"2303.04137","is_internal_anchor":true}],"resolved_work":28,"snapshot_sha256":"961f45c2a95c384988d788200a803c2f8ff5c59a0704cf1e7fd1bdb9f50fc1e8","internal_anchors":14},"formal_canon":{"evidence_count":3,"snapshot_sha256":"0e4e695f16497c7a72559d8dc9590b32ad87585f9f46b7f6464a3b19e7035bad"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}