The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Gated Attention for Large Language Models: Non-linearity, Sparsity, Attention-Sink-Free , author=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

SLASH the Sink: Sharpening Structural Attention Inside LLMs

cs.AI · 2026-05-11 · unverdicted · novelty 6.0 · 3 refs

SLASH is a plug-and-play attention redistribution technique that counters attention sinks to enhance LLMs' intrinsic graph topology reconstruction without any training or fine-tuning.

Learn-to-learn on Arbitrary Textual Conditioning: A Hypernetwork-Driven Meta-Gated LLM

cs.CL · 2026-05-03 · unverdicted · novelty 6.0

A hypernetwork generates meta-gating parameters for SwiGLU blocks to let LLMs adapt their nonlinearity to arbitrary textual conditions, outperforming finetuning and meta-learning baselines with reasonable generalization to unseen cases.

LayerBoost: Layer-Aware Attention Reduction for Efficient LLMs

cs.LG · 2026-04-23 · unverdicted · novelty 5.0

LayerBoost selectively replaces or removes attention in non-critical transformer layers to cut inference latency up to 68% while recovering quality via brief distillation.

citing papers explorer

Showing 3 of 3 citing papers.

SLASH the Sink: Sharpening Structural Attention Inside LLMs cs.AI · 2026-05-11 · unverdicted · none · ref 24 · 3 links
SLASH is a plug-and-play attention redistribution technique that counters attention sinks to enhance LLMs' intrinsic graph topology reconstruction without any training or fine-tuning.
Learn-to-learn on Arbitrary Textual Conditioning: A Hypernetwork-Driven Meta-Gated LLM cs.CL · 2026-05-03 · unverdicted · none · ref 100
A hypernetwork generates meta-gating parameters for SwiGLU blocks to let LLMs adapt their nonlinearity to arbitrary textual conditions, outperforming finetuning and meta-learning baselines with reasonable generalization to unseen cases.
LayerBoost: Layer-Aware Attention Reduction for Efficient LLMs cs.LG · 2026-04-23 · unverdicted · none · ref 36
LayerBoost selectively replaces or removes attention in non-critical transformer layers to cut inference latency up to 68% while recovering quality via brief distillation.

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

fields

years

verdicts

representative citing papers

citing papers explorer