FP16 KV caching in transformers causes deterministic token divergence versus cache-free inference due to non-associative floating-point accumulation orderings.
Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Generative AI should be evaluated through computational hermeneutics using iterative, human-inclusive benchmarks that measure cultural context rather than isolated model outputs.
citing papers explorer
-
The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference
FP16 KV caching in transformers causes deterministic token divergence versus cache-free inference due to non-associative floating-point accumulation orderings.
-
Computational Hermeneutics: Evaluating generative AI as a cultural technology
Generative AI should be evaluated through computational hermeneutics using iterative, human-inclusive benchmarks that measure cultural context rather than isolated model outputs.