Replacement and interchange swap-KL protocols for layer redundancy in transformers disagree on pruning safety, with the gap growing during training on Pythia models and producing different removal costs on Qwen3-8B versus Llama-3.1-8B.
RoPE causes the gap
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
No Free Swap: Protocol-Dependent Layer Redundancy in Transformers
Replacement and interchange swap-KL protocols for layer redundancy in transformers disagree on pruning safety, with the gap growing during training on Pythia models and producing different removal costs on Qwen3-8B versus Llama-3.1-8B.