LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
Argmax flows and multinomial diffusion: Learning categorical distributions.Advances in Neural Information Processing Systems, 34:12454–12465
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
fields
cs.CL 2verdicts
UNVERDICTED 2roles
background 2polarities
background 2representative citing papers
TABOM is a trajectory-aligned Boltzmann modeling framework that turns self-distilled inference paths into a pairwise ranking loss to close the training-inference gap in diffusion language models and expand their effective capabilities.
citing papers explorer
-
Large Language Diffusion Models
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
-
Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models
TABOM is a trajectory-aligned Boltzmann modeling framework that turns self-distilled inference paths into a pairwise ranking loss to close the training-inference gap in diffusion language models and expand their effective capabilities.