Why do small language models underperform? studying lan- guage model saturation via the softmax bottleneck.arXiv preprint arXiv:2404.07647,

Godey, N · arXiv 2404.07647

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

When Less is More: The LLM Scaling Paradox in Context Compression

cs.LG · 2026-02-10 · unverdicted · novelty 6.0

Larger LLM compressors in lossy setups often yield less faithful context reconstructions due to knowledge overwriting and semantic drift, with mid-sized models outperforming larger ones across 27 tested configurations.

citing papers explorer

Showing 1 of 1 citing paper.

When Less is More: The LLM Scaling Paradox in Context Compression cs.LG · 2026-02-10 · unverdicted · none · ref 4
Larger LLM compressors in lossy setups often yield less faithful context reconstructions due to knowledge overwriting and semantic drift, with mid-sized models outperforming larger ones across 27 tested configurations.

Why do small language models underperform? studying lan- guage model saturation via the softmax bottleneck.arXiv preprint arXiv:2404.07647,

fields

years

verdicts

representative citing papers

citing papers explorer