Title resolution pending

URLhttps://arxiv · 2025 · arXiv 2512.08819

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Layer Collapse in Diffusion Language Models

cs.LG · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

Diffusion language models develop early-layer collapse around an indispensable super-outlier due to overtraining, resulting in higher compressibility and reversed optimal sparsity patterns versus autoregressive models.

From Words to Amino Acids: Does the Curse of Depth Persist?

cs.LG · 2026-02-25 · unverdicted · novelty 6.0

Protein language models exhibit consistent depth inefficiency where most task-relevant computation occurs in a subset of layers, mirroring patterns in large language models.

Gradient Smoothing: Coupling Layer-wise Updates for Improved Optimization

cs.LG · 2026-06-29 · unverdicted · novelty 4.0

Gradient Smoothing applies depth-wise smoothing to optimizer updates from base methods like Adam, yielding consistent gains in optimization and generalization on language, RL, diffusion, and vision tasks.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Layer Collapse in Diffusion Language Models cs.LG · 2026-05-07 · unverdicted · none · ref 8 · 2 links
Diffusion language models develop early-layer collapse around an indispensable super-outlier due to overtraining, resulting in higher compressibility and reversed optimal sparsity patterns versus autoregressive models.
From Words to Amino Acids: Does the Curse of Depth Persist? cs.LG · 2026-02-25 · unverdicted · none · ref 17
Protein language models exhibit consistent depth inefficiency where most task-relevant computation occurs in a subset of layers, mirroring patterns in large language models.
Gradient Smoothing: Coupling Layer-wise Updates for Improved Optimization cs.LG · 2026-06-29 · unverdicted · none · ref 13
Gradient Smoothing applies depth-wise smoothing to optimizer updates from base methods like Adam, yielding consistent gains in optimization and generalization on language, RL, diffusion, and vision tasks.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer