ContextPilot reduces LLM prefill latency by up to 3x via context indexing, ordering, de-duplication, and succinct annotations that maximize KV-cache reuse while preserving or improving reasoning quality.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 4representative citing papers
AlignedServe uses prefix-aware batching, large CPU in-flight request pools, batch scheduling, and GPU-to-GPU KV prefetching to raise decoding throughput up to 1.98x and cut latency up to 7.4x versus prior serving systems.
CacheWeaver is a lightweight scheduling layer that orders evidence to exploit prefix caching, reducing median TTFT by 20-33% across vLLM setups while preserving answer quality.
MemForest reformulates agent memory as a temporal data management problem using a hierarchical index (MemTree) for parallel construction and localized updates, reporting 79.8% accuracy and 6x throughput on LongMemEval-S and LoCoMo benchmarks.
citing papers explorer
-
AlignedServe: Orchestrating Prefix-aware Batching to Build a High-throughput and Computing-efficient LLM Serving System
AlignedServe uses prefix-aware batching, large CPU in-flight request pools, batch scheduling, and GPU-to-GPU KV prefetching to raise decoding throughput up to 1.98x and cut latency up to 7.4x versus prior serving systems.
-
CacheWeaver: Cache-Aware Evidence Ordering for Efficient Grounded RAG Inference
CacheWeaver is a lightweight scheduling layer that orders evidence to exploit prefix caching, reducing median TTFT by 20-33% across vLLM setups while preserving answer quality.
-
MemForest: An Efficient Agent Memory System with Hierarchical Temporal Indexing
MemForest reformulates agent memory as a temporal data management problem using a hierarchical index (MemTree) for parallel construction and localized updates, reporting 79.8% accuracy and 6x throughput on LongMemEval-S and LoCoMo benchmarks.