DLLM-VSR applies diffusion LLMs to VSR via masked denoising, two-stage training, and length-guided candidate decoding to reach 19.5% WER on LRS3.
Diffusion llm with native variable generation lengths: Let [eos] lead the way.arXiv preprint arXiv:2510.24605, 2025
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3verdicts
UNVERDICTED 3representative citing papers
iLLaDA is an 8B masked diffusion LM trained from scratch with bidirectional attention, reporting gains of 14-21 points on BBH, ARC, MATH and HumanEval over prior diffusion models while remaining competitive with Qwen2.5-7B.
VoidPadding decouples padding from termination in MDLMs via a new [VOID] token, delivering +17.84 average benchmark points and 55.7% fewer decoding steps on Dream-7B-Instruct.
citing papers explorer
-
Diffusion Large Language Models for Visual Speech Recognition
DLLM-VSR applies diffusion LLMs to VSR via masked denoising, two-stage training, and length-guided candidate decoding to reach 19.5% WER on LRS3.