Rethinking Layer Redundancy in Large Language Models: Calibration Objectives and Search for Depth Pruning

· 2026 · cs.LG · arXiv 2604.24938

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Depth pruning improves the inference efficiency of large language models by removing Transformer blocks. Prior work has largely treated layer redundancy as an inherent structural property of pretrained networks, emphasizing importance criteria and search algorithms for identifying removable layers. In contrast, we adopt a \emph{functional perspective}, where redundancy depends jointly on the model and the calibration objective, suggesting that a universal layer ranking may not exist. Through an empirical study across three LLM families, two calibration objectives, and seven search algorithms, we find that different objectives produce qualitatively different pruning patterns, while perplexity and downstream reasoning accuracy rankings often fail to align. In contrast, under a fixed objective, different search algorithms tend to converge to similar pruning solutions. Overall, our results suggest that the calibration objective may play a larger role than the particular search algorithm in determining which layers appear redundant.

representative citing papers

Locality-Aware Redundancy Pruning for LLM Depth Compression

cs.LG · 2026-05-27 · unverdicted · novelty 6.0

LoRP uses a new Representation Locality Score derived from inter-layer hidden-state similarity to cluster layers and prune intra-cluster redundancies in one shot, yielding better perplexity and task accuracy than prior depth-pruning baselines across LLM families.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Locality-Aware Redundancy Pruning for LLM Depth Compression cs.LG · 2026-05-27 · unverdicted · none · ref 13 · internal anchor
LoRP uses a new Representation Locality Score derived from inter-layer hidden-state similarity to cluster layers and prune intra-cluster redundancies in one shot, yielding better perplexity and task accuracy than prior depth-pruning baselines across LLM families.

Rethinking Layer Redundancy in Large Language Models: Calibration Objectives and Search for Depth Pruning

fields

years

verdicts

representative citing papers

citing papers explorer