SLASH is a plug-and-play attention redistribution technique that counters attention sinks to enhance LLMs' intrinsic graph topology reconstruction without any training or fine-tuning.
The Twelfth International Conference on Learning Representations , year=
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
years
2026 4representative citing papers
GRASPrune removes 50% of parameters from LLaMA-2-7B via global gating and projected straight-through estimation, reaching 12.18 WikiText-2 perplexity and competitive zero-shot accuracy after four epochs on 512 calibration sequences.
citing papers explorer
-
SLASH the Sink: Sharpening Structural Attention Inside LLMs
SLASH is a plug-and-play attention redistribution technique that counters attention sinks to enhance LLMs' intrinsic graph topology reconstruction without any training or fine-tuning.
-
GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models
GRASPrune removes 50% of parameters from LLaMA-2-7B via global gating and projected straight-through estimation, reaching 12.18 WikiText-2 perplexity and competitive zero-shot accuracy after four epochs on 512 calibration sequences.
- Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization
- How Language Models Process Negation