Contribution Weights combine attention, value magnitude, and directional alignment to measure token influence more faithfully than attention alone, and show attention sinks actively suppress information via a convex sink-rate to output-norm relationship.
Attention Layers Add Into Low-Dimensional Residual Subspaces , url =
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Linear probes recover day-of-year from LM activations for temporal reasoning but are orthogonal to the model's causal 4D subspace identified by DAS, with the angle matching the Haar-uniform random null, replicated across scales and families.
RankUp raises effective rank of representations in deep MetaFormer recommenders via randomized splitting and multi-embeddings, delivering 2-5% GMV gains in production deployments at Weixin.
citing papers explorer
-
Contribution Weights: A Geometrical Analysis of Self-Attention Transformers
Contribution Weights combine attention, value magnitude, and directional alignment to measure token influence more faithfully than attention alone, and show attention sinks actively suppress information via a convex sink-rate to output-norm relationship.
-
When and How Long? The Readout-Mediator Angle in Temporal Reasoning
Linear probes recover day-of-year from LM activations for temporal reasoning but are orthogonal to the model's causal 4D subspace identified by DAS, with the angle matching the Haar-uniform random null, replicated across scales and families.