Pre- fixing attention sinks can mitigate activation outliers for large language model quantization.arXiv preprint arXiv:2406.12016

Son, S · arXiv 2406.12016

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

A Single Layer to Explain Them All:Understanding Massive Activations in Large Language Models

cs.CL · 2026-05-08 · conditional · novelty 7.0 · 2 refs

Massive activations first appear in a single ME Layer due to RMSNorm and FFN, remain invariant thereafter, and a simple softening method raises LLM performance while reducing attention sinks.

citing papers explorer

Showing 1 of 1 citing paper.

A Single Layer to Explain Them All:Understanding Massive Activations in Large Language Models cs.CL · 2026-05-08 · conditional · none · ref 22 · 2 links
Massive activations first appear in a single ME Layer due to RMSNorm and FFN, remain invariant thereafter, and a simple softening method raises LLM performance while reducing attention sinks.

Pre- fixing attention sinks can mitigate activation outliers for large language model quantization.arXiv preprint arXiv:2406.12016

fields

years

verdicts

representative citing papers

citing papers explorer