Layer-wise representation alignment lets diffusion language models reuse semantic structures from frozen autoregressive models, accelerating training by up to 4x without architectural changes beyond the attention mask.
International Conference on Learning Representations , year =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
extension 1
citation-polarity summary
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2roles
extension 1polarities
extend 1representative citing papers
Sparse internal snapshots at canonical low-noise levels from frozen diffusion backbones suffice for competitive out-of-distribution detection without full trajectories or large heads.
citing papers explorer
-
Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment
Layer-wise representation alignment lets diffusion language models reuse semantic structures from frozen autoregressive models, accelerating training by up to 4x without architectural changes beyond the attention mask.
-
Backbone-Equated Diffusion OOD via Sparse Internal Snapshots
Sparse internal snapshots at canonical low-noise levels from frozen diffusion backbones suffice for competitive out-of-distribution detection without full trajectories or large heads.