SLASH is a plug-and-play attention redistribution technique that counters attention sinks to enhance LLMs' intrinsic graph topology reconstruction without any training or fine-tuning.
The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
A hypernetwork generates meta-gating parameters for SwiGLU blocks to let LLMs adapt their nonlinearity to arbitrary textual conditions, outperforming finetuning and meta-learning baselines with reasonable generalization to unseen cases.
LayerBoost selectively replaces or removes attention in non-critical transformer layers to cut inference latency up to 68% while recovering quality via brief distillation.
citing papers explorer
-
SLASH the Sink: Sharpening Structural Attention Inside LLMs
SLASH is a plug-and-play attention redistribution technique that counters attention sinks to enhance LLMs' intrinsic graph topology reconstruction without any training or fine-tuning.
-
Learn-to-learn on Arbitrary Textual Conditioning: A Hypernetwork-Driven Meta-Gated LLM
A hypernetwork generates meta-gating parameters for SwiGLU blocks to let LLMs adapt their nonlinearity to arbitrary textual conditions, outperforming finetuning and meta-learning baselines with reasonable generalization to unseen cases.
-
LayerBoost: Layer-Aware Attention Reduction for Efficient LLMs
LayerBoost selectively replaces or removes attention in non-critical transformer layers to cut inference latency up to 68% while recovering quality via brief distillation.