VeriCache turns lossy KV cache compression into lossless LLM inference by drafting with compressed cache and verifying drafts with full cache, achieving up to 4x throughput with identical outputs.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
A systematic survey of Multimodal RAG for document understanding proposing a taxonomy based on domain, retrieval modality, and granularity while reviewing graph structures, agentic frameworks, datasets, benchmarks, applications, and open challenges.
citing papers explorer
-
VeriCache: Turning Lossy KV Cache into Lossless LLM Inference
VeriCache turns lossy KV cache compression into lossless LLM inference by drafting with compressed cache and verifying drafts with full cache, achieving up to 4x throughput with identical outputs.
-
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding
A systematic survey of Multimodal RAG for document understanding proposing a taxonomy based on domain, retrieval modality, and granularity while reviewing graph structures, agentic frameworks, datasets, benchmarks, applications, and open challenges.