hub

arXiv preprint arXiv:2402.02834 , volume=

Shortened llama: A simple depth pruning for large language models , author= · 2024 · arXiv 2402.02834

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

DASH discovers stronger hybrid attention architectures for LLMs via minutes-scale differentiable search, outperforming selector baselines and Jet-Nemotron on RULER while using 0.006% of prior search tokens.

Understanding Performance Collapse in Layer-Pruned Large Language Models via Decision Representation Transitions

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

Performance collapse in layer-pruned LLMs stems from disrupting the Silent Phase of decision-making, which blocks the transition to correct predictions, while the later Decisive Phase is robust to pruning.

SimDiff: Depth Pruning via Similarity and Difference

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

SimDiff uses similarity and difference metrics to prune LLM layers more effectively than cosine similarity alone, retaining over 91% performance at 25% pruning on LLaMA2-7B.

Why and When Visual Token Pruning Fails? A Study on Relevant Visual Information Shift in MLLMs Decoding

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

Visual token pruning in MLLMs fails on complex reasoning due to Relevant Visual Information Shift during decoding, but the DSTP framework fixes it training-free across models.

Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

cs.CL · 2025-12-27 · unverdicted · novelty 7.0

Width pruning in Llama-3.2 models reduces parametric knowledge while enhancing instruction-following and preserving reasoning.

SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training

cs.LG · 2026-05-09 · unverdicted · novelty 5.0 · 2 refs

Pruning pretrained MoE models outperforms training from scratch under fixed budget, different expert compression methods converge after continued training, and progressive pruning plus multi-token KD improves the final 23A2B model.

On the Limits of Layer Pruning for Generative Reasoning in Large Language Models

cs.LG · 2026-02-02 · unverdicted · novelty 5.0

Layer pruning preserves classification performance in LLMs but fundamentally limits recovery of generative reasoning capabilities even after extensive self-supervised finetuning.

TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination

cs.LG · 2025-10-26 · unverdicted · novelty 5.0

TALE selectively prunes task-detrimental layers in LLMs at inference time to match or exceed baseline performance with lower computational cost across multiple models and tasks.

Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs

cs.LG · 2026-05-15

TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability

cs.LG · 2026-05-14

citing papers explorer

Showing 2 of 2 citing papers after filters.

Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs cs.LG · 2026-05-15 · unreviewed · ref 17
TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability cs.LG · 2026-05-14 · unreviewed · ref 22

arXiv preprint arXiv:2402.02834 , volume=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer