{"paper":{"title":"Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Block diffusion language models interpolate between autoregressive and diffusion approaches to support arbitrary-length generation.","cross_cats":["cs.AI"],"primary_cat":"cs.LG","authors_text":"Aaron Gokaslan, Jiaqi Han, Justin T. Chiu, Marianne Arriola, Subham Sekhar Sahoo, Volodymyr Kuleshov, Zhihan Yang, Zhixuan Qi","submitted_at":"2025-03-12T17:43:40Z","abstract_excerpt":"Diffusion language models offer unique benefits over autoregressive models due to their potential for parallelized generation and controllability, yet they lag in likelihood modeling and are limited to fixed-length generation. In this work, we introduce a class of block diffusion language models that interpolate between discrete denoising diffusion and autoregressive models. Block diffusion overcomes key limitations of both approaches by supporting flexible-length generation and improving inference efficiency with KV caching and parallel token sampling. We propose a recipe for building effecti"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Block diffusion sets a new state-of-the-art performance among diffusion models on language modeling benchmarks and enables generation of arbitrary-length sequences.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The assumption that the proposed training algorithm, estimators of gradient variance, and data-driven noise schedules will produce stable and effective models that generalize beyond the training data without hidden instabilities or overfitting.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Block diffusion language models interpolate between autoregressive and diffusion models to support flexible-length generation and achieve state-of-the-art performance among diffusion models.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Block diffusion language models interpolate between autoregressive and diffusion approaches to support arbitrary-length generation.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"d8d2c38f29f42a19534692b11e3fbc296219ddb7aa85975d74a2d57a4b7032c9"},"source":{"id":"2503.09573","kind":"arxiv","version":3},"verdict":{"id":"de29bb41-2ce2-426c-89e5-052e5d87aada","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T10:53:58.246957Z","strongest_claim":"Block diffusion sets a new state-of-the-art performance among diffusion models on language modeling benchmarks and enables generation of arbitrary-length sequences.","one_line_summary":"Block diffusion language models interpolate between autoregressive and diffusion models to support flexible-length generation and achieve state-of-the-art performance among diffusion models.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The assumption that the proposed training algorithm, estimators of gradient variance, and data-driven noise schedules will produce stable and effective models that generalize beyond the training data without hidden instabilities or overfitting.","pith_extraction_headline":"Block diffusion language models interpolate between autoregressive and diffusion approaches to support arbitrary-length generation."},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"83b5fdca3750b24655294c819078d4126181615e16a6e578dcf11a9646ec8cb7"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}