and van Schijndel, M

William Timkey, Marten van Schijndel · 2021 · arXiv 2109.04404

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 unclear 1

representative citing papers

A Single Layer to Explain Them All:Understanding Massive Activations in Large Language Models

cs.CL · 2026-05-08 · conditional · novelty 7.0 · 2 refs

Massive activations first appear in a single ME Layer due to RMSNorm and FFN, remain invariant thereafter, and a simple softening method raises LLM performance while reducing attention sinks.

Massive Activations in Large Language Models

cs.CL · 2024-02-27 · unverdicted · novelty 7.0

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

Eliciting Latent Predictions from Transformers with the Tuned Lens

cs.LG · 2023-03-14 · accept · novelty 7.0

Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

cs.LG · 2022-08-15 · conditional · novelty 7.0

LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.

HyperLens: Quantifying Cognitive Effort in LLMs with Fine-grained Confidence Trajectory

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

HyperLens reveals that deeper transformer layers magnify small confidence changes into fine-grained trajectories, allowing quantification of cognitive effort where complex tasks demand more and standard SFT can reduce it.

citing papers explorer

Showing 5 of 5 citing papers.

A Single Layer to Explain Them All:Understanding Massive Activations in Large Language Models cs.CL · 2026-05-08 · conditional · none · ref 24 · 2 links
Massive activations first appear in a single ME Layer due to RMSNorm and FFN, remain invariant thereafter, and a simple softening method raises LLM performance while reducing attention sinks.
Massive Activations in Large Language Models cs.CL · 2024-02-27 · unverdicted · none · ref 150
Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.
Eliciting Latent Predictions from Transformers with the Tuned Lens cs.LG · 2023-03-14 · accept · none · ref 85
Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale cs.LG · 2022-08-15 · conditional · none · ref 163
LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.
HyperLens: Quantifying Cognitive Effort in LLMs with Fine-grained Confidence Trajectory cs.AI · 2026-05-07 · unverdicted · none · ref 29
HyperLens reveals that deeper transformer layers magnify small confidence changes into fine-grained trajectories, allowing quantification of cognitive effort where complex tasks demand more and standard SFT can reduce it.

and van Schijndel, M

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer