Diffusion language models develop early-layer collapse around an indispensable super-outlier due to overtraining, resulting in higher compressibility and reversed optimal sparsity patterns versus autoregressive models.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Protein language models exhibit consistent depth inefficiency where most task-relevant computation occurs in a subset of layers, mirroring patterns in large language models.
citing papers explorer
-
Layer Collapse in Diffusion Language Models
Diffusion language models develop early-layer collapse around an indispensable super-outlier due to overtraining, resulting in higher compressibility and reversed optimal sparsity patterns versus autoregressive models.
-
From Words to Amino Acids: Does the Curse of Depth Persist?
Protein language models exhibit consistent depth inefficiency where most task-relevant computation occurs in a subset of layers, mirroring patterns in large language models.