A new fault-injection framework enables a systematic empirical study that produces 17 takeaways on error propagation in LLM inference and four software-only mitigation directions.
arXiv preprint arXiv:2305.02633 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
RaBitQCache proposes rotated binary quantization with binary-INT4 arithmetic for unbiased attention weight estimation in long-context LLMs, enabling adaptive Top-p retrieval and hardware optimizations.
citing papers explorer
-
Not All Errors Are Equal: A Systematic Study of Error Propagation in Large Language Model Inference
A new fault-injection framework enables a systematic empirical study that produces 17 takeaways on error propagation in LLM inference and four software-only mitigation directions.
-
RaBitQCache: Rotated Binary Quantization for KVCache in Long Context LLM Inference
RaBitQCache proposes rotated binary quantization with binary-INT4 arithmetic for unbiased attention weight estimation in long-context LLMs, enabling adaptive Top-p retrieval and hardware optimizations.