A tiered KV cache architecture computes per-head per-step error bounds on quantized attention and uses adaptive fallback to guarantee bounded or exact outputs relative to FP16 reference.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
OASIS mitigates attention sinks and outliers in AttnResidual models via Softmax1 null space and inter-layer signals, reporting norm and kurtosis reductions plus large gains in quantized perplexity and task accuracy.
citing papers explorer
-
Runtime-Certified Bounded-Error Quantized Attention
A tiered KV cache architecture computes per-head per-step error bounds on quantized attention and uses adaptive fallback to guarantee bounded or exact outputs relative to FP16 reference.
-
Attention Sinks and Outliers in Attention Residuals
OASIS mitigates attention sinks and outliers in AttnResidual models via Softmax1 null space and inter-layer signals, reporting norm and kurtosis reductions plus large gains in quantized perplexity and task accuracy.