In 160M and 290M parameter models, a new residual-stream split into scratch and protected channels causes massive activations to re-emerge in the protected decode channel, more concentrated on the start token.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Massive Activations Are Architecturally Robust: A Controlled Scratch/Commitment Residual Stream Test
In 160M and 290M parameter models, a new residual-stream split into scratch and protected channels causes massive activations to re-emerge in the protected decode channel, more concentrated on the start token.