Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing , year =

Marjan Ghazvininejad, Omer Levy, Yinhan Liu, Luke Zettlemoyer · 2019 · DOI 10.18653/v1/d19-1633

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement

cs.CL · 2026-05-14 · unverdicted · novelty 7.0

DiHAL uses geometry proxies to pick where to replace the lower layers of a pretrained transformer with a diffusion bridge for hidden-state reconstruction, improving over token-level diffusion baselines on 8B models.

Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

Layer-wise representation alignment lets diffusion language models reuse semantic structures from frozen autoregressive models, accelerating training by up to 4x without architectural changes beyond the attention mask.

Coupling Models for One-Step Discrete Generation

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Coupling Models enable single-step discrete sequence generation via learned couplings to Gaussian latents and outperform prior one-step baselines on text perplexity, biological FBD, and image FID metrics.

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

cs.CL · 2024-10-23 · conditional · novelty 6.0

Adapting autoregressive models via continual pre-training yields diffusion language models from 127M to 7B parameters that outperform prior diffusion models and compete with their autoregressive counterparts on language, reasoning, and commonsense benchmarks.

citing papers explorer

Showing 4 of 4 citing papers.

Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement cs.CL · 2026-05-14 · unverdicted · none · ref 28
DiHAL uses geometry proxies to pick where to replace the lower layers of a pretrained transformer with a diffusion bridge for hidden-state reconstruction, improving over token-level diffusion baselines on 8B models.
Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment cs.LG · 2026-05-07 · unverdicted · none · ref 90
Layer-wise representation alignment lets diffusion language models reuse semantic structures from frozen autoregressive models, accelerating training by up to 4x without architectural changes beyond the attention mask.
Coupling Models for One-Step Discrete Generation cs.LG · 2026-05-08 · unverdicted · none · ref 63
Coupling Models enable single-step discrete sequence generation via learned couplings to Gaussian latents and outperform prior one-step baselines on text perplexity, biological FBD, and image FID metrics.
Scaling Diffusion Language Models via Adaptation from Autoregressive Models cs.CL · 2024-10-23 · conditional · none · ref 121
Adapting autoregressive models via continual pre-training yields diffusion language models from 127M to 7B parameters that outperform prior diffusion models and compete with their autoregressive counterparts on language, reasoning, and commonsense benchmarks.

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing , year =

fields

years

verdicts

representative citing papers

citing papers explorer