GroundedCache reduces unsafe-served rate in RAG answer caching to 0-1.5% (vs 15-51.5% naive) via four validation gates while keeping p50 latency within 1.07x of no-cache baseline.
Chan, Chao-Ting Chen, Jui-Hung Cheng, and Hen-Hsen Tiong
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3roles
baseline 1polarities
baseline 1representative citing papers
WiCER iteratively diagnoses and repairs fact loss during wiki compilation for LLMs, recovering 80% of quality lost in blind distillation across 17 domains while cutting catastrophic failures by 55%.
ContextForge recycles context in long-horizon LLM tasks via query generation, memory retrieval, and synthesis, yielding reduced token use and improved consistency on a 15-turn healthcare benchmark while preserving accuracy.
citing papers explorer
-
Grounded Cache Routing for Retrieval-Augmented Generation: When Is It Safe to Reuse an Answer?
GroundedCache reduces unsafe-served rate in RAG answer caching to 0-1.5% (vs 15-51.5% naive) via four validation gates while keeping p50 latency within 1.07x of no-cache baseline.
-
WiCER: Wiki-memory Compile, Evaluate, Refine Iterative Knowledge Compilation for LLM Wiki Systems
WiCER iteratively diagnoses and repairs fact loss during wiki compilation for LLMs, recovering 80% of quality lost in blind distillation across 17 domains while cutting catastrophic failures by 55%.
-
Context Recycling for Long-Horizon LLM Inference
ContextForge recycles context in long-horizon LLM tasks via query generation, memory retrieval, and synthesis, yielding reduced token use and improved consistency on a 15-turn healthcare benchmark while preserving accuracy.