{"paper":{"title":"Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Diffusion LLMs discover reliable parallel decoding orders through revokable generation and then learn to use them for faster, higher-quality output.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Bo Han, Fanqin Zeng, Feng Hong, Geng Yu, Huangjie Zheng, Jiangchao Yao, Xiaofeng Cao, Yanfeng Wang, Ya Zhang","submitted_at":"2026-05-16T11:27:40Z","abstract_excerpt":"Diffusion Large Language Models (DLLMs) promise fast parallel generation, yet open-source DLLMs still face a severe quality-speed trade-off: accelerating decoding by revealing multiple tokens often causes substantial quality degradation. We attribute this dilemma to a train-inference mismatch amplified by irreversible decoding. While training reconstructs tokens from randomly corrupted states, efficient inference requires an adaptive denoising order, where easier tokens are revealed earlier and context-dependent ones are deferred. This view motivates two complementary methods: an inference-tim"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"DLLMs can serve as their own efficiency teachers by first discovering reliable denoising orders through revokable decoding and then learning to follow them for faster generation.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The verification step in WINO, which uses enriched global context to decide which drafted tokens are reliable, correctly identifies tokens that will remain correct in the final output rather than merely appearing consistent at an intermediate step.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Diffusion LLMs can act as their own efficiency teachers by using revokable parallel decoding to identify reliable token orders and then distilling those orders into the model parameters for faster inference.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Diffusion LLMs discover reliable parallel decoding orders through revokable generation and then learn to use them for faster, higher-quality output.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"3030634ff5438902e9d4d23d6ba2258f94150fb7776ba2c165abd330c662f254"},"source":{"id":"2605.16941","kind":"arxiv","version":1},"verdict":{"id":"ac54eb00-7f64-4277-8655-12e1f470931e","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T20:38:02.167303Z","strongest_claim":"DLLMs can serve as their own efficiency teachers by first discovering reliable denoising orders through revokable decoding and then learning to follow them for faster generation.","one_line_summary":"Diffusion LLMs can act as their own efficiency teachers by using revokable parallel decoding to identify reliable token orders and then distilling those orders into the model parameters for faster inference.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The verification step in WINO, which uses enriched global context to decide which drafted tokens are reliable, correctly identifies tokens that will remain correct in the final output rather than merely appearing consistent at an intermediate step.","pith_extraction_headline":"Diffusion LLMs discover reliable parallel decoding orders through revokable generation and then learn to use them for faster, higher-quality output."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.16941/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"doi_title_agreement","ran_at":"2026-05-19T21:01:19.126049Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T20:51:04.962222Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"cited_work_retraction","ran_at":"2026-05-19T20:21:57.326330Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"claim_evidence","ran_at":"2026-05-19T18:41:56.246081Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T18:33:26.328432Z","status":"skipped","version":"1.0.0","findings_count":0}],"snapshot_sha256":"bc46a61169ab50724fc8c8c13787f1766c822be9117f2ee2a5998172bc90c247"},"references":{"count":45,"sample":[{"doi":"","year":2018,"title":"Improving language understanding by generative pre-training,","work_id":"9bd865d6-7962-4e04-81fc-4739527e0a04","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2019,"title":"Language models are unsupervised multitask learners,","work_id":"f2d43298-fa7b-4a79-8665-7fb8b50066aa","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"Chatgpt: Optimizing language models for dialogue,","work_id":"2e7e6bc8-073c-4428-b35c-a46a08b33249","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"arXiv preprint arXiv:2310.12397 , year=","work_id":"70d82820-e51a-49e7-abcd-bf7786454843","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"arXiv preprint arXiv:2310.08118 , year=","work_id":"306f5aab-f72e-4343-bb40-2edd0e2de2c3","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":45,"snapshot_sha256":"c1b7a6f59ca581eb6befcbf82d616bde0125bfc1c81823e594e963624044314d","internal_anchors":11},"formal_canon":{"evidence_count":2,"snapshot_sha256":"52364e9baaa2342540444e2db1f22aee48dadf276ef3a4173e5e4c0ef8220bca"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}