pith. machine review for the scientific record. sign in

arxiv: 2512.07173 · v4 · submitted 2025-12-08 · 💻 cs.LG

Recognition: unknown

Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration

Authors on Pith no claims yet
classification 💻 cs.LG
keywords cadllmthroughputconfidencediffusion-baseddllmsmethodsizetraining-free
0
0 comments X
read the original abstract

We present CadLLM, a training-free method to accelerate the inference throughput of diffusion-based LLMs (dLLMs). We first investigate the dynamic nature of token unmasking confidence across blocks and steps. Based on this observation, we present a lightweight adaptive approach that controls the generation block size, step size, and threshold based on the average confidence of unmasked tokens. We further reduce softmax overhead by dynamically leveraging a subset of the vocabulary to regulate sampling breadth. CadLLM is a plug-and-play, model-agnostic method compatible with KV-cache-based dLLMs. Extensive experiments on four popular tasks demonstrate that CadLLM yields up to 1.1-2.28x throughput improvement over the state-of-the-art baseline with competitive accuracy.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM

    cs.CL 2026-05 unverdicted novelty 7.0

    TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.

  2. $R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction

    cs.CL 2026-04 unverdicted novelty 7.0

    R²-dLLM reduces dLLM decoding steps by up to 75% via spatio-temporal redundancy reduction while keeping generation quality competitive.