ShortGPT: Layers in large language models are more redundant than you expect

Xin Men, Mingyu Xu, Qingyu Zhang, Qianhao Yuan, Bingning Wang, Hongyu Lin, Yaojie Lu, Xianpei Han, Weipeng Chen · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Why Geometric Continuity Emerges in Deep Neural Networks: Residual Connections and Rotational Symmetry Breaking

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

Residual connections align cross-layer gradients while symmetry-breaking activations prevent rotational drift, causing principal singular vectors of adjacent layers to align.

citing papers explorer

Showing 1 of 1 citing paper.

Why Geometric Continuity Emerges in Deep Neural Networks: Residual Connections and Rotational Symmetry Breaking cs.LG · 2026-05-06 · unverdicted · none · ref 4
Residual connections align cross-layer gradients while symmetry-breaking activations prevent rotational drift, causing principal singular vectors of adjacent layers to align.

ShortGPT: Layers in large language models are more redundant than you expect

fields

years

verdicts

representative citing papers

citing papers explorer