{"paper":{"title":"Uncertainty Quantification for Large Language Diffusion Models","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Expected trajectory dissimilarity from the denoising process lower-bounds the masked diffusion training objective and serves as a lightweight uncertainty score for large language diffusion models.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Artem Shelmanov, Artem Vazhentsev, David Li, Maxim Panov, Timothy Baldwin, Vladislav Smirnov","submitted_at":"2026-05-14T08:39:56Z","abstract_excerpt":"Large Language Diffusion Models (LLDMs) are emerging as an alternative to autoregressive models, offering faster inference through higher parallelism. Similar to autoregressive LLMs, they remain prone to hallucinations, making reliable uncertainty quantification (UQ) crucial for safe deployment. However, existing UQ methods are fundamentally misaligned with this new paradigm: they assume autoregressive factorization or use expensive repeated sampling, negating the efficiency of LLDMs. In this work, we present the first systematic study of UQ for LLDMs and propose lightweight, zero-shot uncerta"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We prove that expected trajectory dissimilarity lower bounds the masked diffusion training objective, which motivates its usage as an uncertainty score. Comprehensive experiments across three tasks, eight datasets, and two models show that our method achieves a great cost-performance trade-off: it approaches the strongest sampling-based baselines while incurring up to 100x lower computational overhead.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The assumption that signals derived from the denoising trajectory (intermediate generations, remasking dynamics, trajectory dissimilarity) correlate with actual hallucination risk holds across the tested tasks and models and generalizes beyond them.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Uncertainty signals from LLDM denoising trajectories, including a proven lower bound on the training objective, achieve near sampling-based hallucination detection at up to 100x lower cost.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Expected trajectory dissimilarity from the denoising process lower-bounds the masked diffusion training objective and serves as a lightweight uncertainty score for large language diffusion models.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"ab9dbd7fc0765acb66ddc7ac5a7532a1add23258228b5ac42052d3db796bcf80"},"source":{"id":"2605.14570","kind":"arxiv","version":1},"verdict":{"id":"2e5d43af-4b3c-40d5-b448-2b3871a9db93","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T01:33:30.058730Z","strongest_claim":"We prove that expected trajectory dissimilarity lower bounds the masked diffusion training objective, which motivates its usage as an uncertainty score. Comprehensive experiments across three tasks, eight datasets, and two models show that our method achieves a great cost-performance trade-off: it approaches the strongest sampling-based baselines while incurring up to 100x lower computational overhead.","one_line_summary":"Uncertainty signals from LLDM denoising trajectories, including a proven lower bound on the training objective, achieve near sampling-based hallucination detection at up to 100x lower cost.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The assumption that signals derived from the denoising trajectory (intermediate generations, remasking dynamics, trajectory dissimilarity) correlate with actual hallucination risk holds across the tested tasks and models and generalizes beyond them.","pith_extraction_headline":"Expected trajectory dissimilarity from the denoising process lower-bounds the masked diffusion training objective and serves as a lightweight uncertainty score for large language diffusion models."},"references":{"count":17,"sample":[{"doi":"","year":2021,"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","ref_index":1,"cited_arxiv_id":"2110.14168","is_internal_anchor":true},{"doi":"","year":null,"title":"Guidelines: • The answer [N/A] means that the abstract and introduction do not include the claims made in the paper","work_id":"9624435f-ccee-42a6-a8b2-6518d0c6aef8","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Limitations Question: Does the paper discuss the limitations of the work performed by the authors? Answer: [Y es] Justiﬁcation: Y es, the limitations of the work are discussed in Appendix E. Guideline","work_id":"9d8079bd-dd33-4287-b63b-7ba009f55810","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Guidelines: • The answer [N/A] means that the paper does not include theoretical results","work_id":"e0e89f66-83d0-4eca-9d06-0fdbd6e035f0","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Guidelines: • The answer [N/A] means that the paper does not include experiments","work_id":"640066ff-62b5-4d10-b45a-343509d4c11b","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":17,"snapshot_sha256":"64cbaf977ba12419f9302a81a1d7989717ebf40ed37421a60f3c91e57311bf01","internal_anchors":1},"formal_canon":{"evidence_count":2,"snapshot_sha256":"b63dedf7d3b263151e2704ac6da96da699aba9c712910811aee78b77c8ace5ff"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}