What are you sinking? a geometric approach on attention sink

What are you sinking? a geometric approach on attention sink · 2015 · arXiv 2508.02546

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

A Single Layer to Explain Them All:Understanding Massive Activations in Large Language Models

cs.CL · 2026-05-08 · conditional · novelty 7.0 · 2 refs

Massive activations first appear in a single ME Layer due to RMSNorm and FFN, remain invariant thereafter, and a simple softening method raises LLM performance while reducing attention sinks.

Retrieval and competition: how a protein foundation model starts a protein

q-bio.BM · 2026-05-05 · unverdicted · novelty 6.0 · 2 refs

ESM2 predicts N-terminal methionine via retrieval of a positional prior from the BOS token through distributed attention circuits rather than direct recognition, revealed by a norm-direction decomposition of rotary attention scores.

A Mechanistic Account of Attention Sinks in GPT-2: One Circuit, Broader Implications for Mitigation

cs.LG · 2026-04-16 · unverdicted · novelty 6.0

Attention sinks in GPT-2 arise from the interaction of learned query bias, first-layer MLP on positional encodings, and key projection structure, with each component individually dispensable.

Parameter Efficiency Is Not Memory Efficiency: Rethinking Fine-Tuning for On-Device LLM Adaptation

cs.LG · 2026-04-03 · unverdicted · novelty 6.0

LARS constrains activation subspaces to decouple memory use from sequence length, cutting GPU memory by 33.5% and CPU memory by 52% versus LoRA while keeping accuracy comparable.

Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization

cs.LG · 2026-06-01 · unverdicted · novelty 5.0

Massive spikes in LLMs are identified as rigid vector biases preserved in rotational stability zones; INSERTQUANT clamps them with template vectors to achieve spike-free PTQ with SOTA parity on LLMs and generalization to ViTs.

ASAP: Attention Sink Anchored Pruning

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

ASAP prunes tokens in ViTs by anchoring on attention sinks modeled as lazy random walks, using cumulative transition matrices and radial diffusion clustering to compress redundancy while preserving accuracy.

SinkTrack: Attention Sink based Context Anchoring for Large Language Models

cs.CV · 2026-04-11 · unverdicted · novelty 5.0 · 2 refs

SinkTrack anchors LLMs to initial context by modifying the attention sink token with injected features, yielding gains on textual and multimodal tasks.

When Sinks Help or Hurt: Unified Framework for Attention Sink in Large Vision-Language Models

cs.CV · 2026-04-01

citing papers explorer

Showing 6 of 6 citing papers after filters.

Retrieval and competition: how a protein foundation model starts a protein q-bio.BM · 2026-05-05 · unverdicted · none · ref 16 · 2 links
ESM2 predicts N-terminal methionine via retrieval of a positional prior from the BOS token through distributed attention circuits rather than direct recognition, revealed by a norm-direction decomposition of rotary attention scores.
A Mechanistic Account of Attention Sinks in GPT-2: One Circuit, Broader Implications for Mitigation cs.LG · 2026-04-16 · unverdicted · none · ref 4
Attention sinks in GPT-2 arise from the interaction of learned query bias, first-layer MLP on positional encodings, and key projection structure, with each component individually dispensable.
Parameter Efficiency Is Not Memory Efficiency: Rethinking Fine-Tuning for On-Device LLM Adaptation cs.LG · 2026-04-03 · unverdicted · none · ref 3
LARS constrains activation subspaces to decouple memory use from sequence length, cutting GPU memory by 33.5% and CPU memory by 52% versus LoRA while keeping accuracy comparable.
Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization cs.LG · 2026-06-01 · unverdicted · none · ref 2
Massive spikes in LLMs are identified as rigid vector biases preserved in rotational stability zones; INSERTQUANT clamps them with template vectors to achieve spike-free PTQ with SOTA parity on LLMs and generalization to ViTs.
ASAP: Attention Sink Anchored Pruning cs.LG · 2026-05-21 · unverdicted · none · ref 16
ASAP prunes tokens in ViTs by anchoring on attention sinks modeled as lazy random walks, using cumulative transition matrices and radial diffusion clustering to compress redundancy while preserving accuracy.
SinkTrack: Attention Sink based Context Anchoring for Large Language Models cs.CV · 2026-04-11 · unverdicted · none · ref 13 · 2 links
SinkTrack anchors LLMs to initial context by modifying the attention sink token with injected features, yielding gains on textual and multimodal tasks.

What are you sinking? a geometric approach on attention sink

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer