Title resolution pending

Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation · 2025 · DOI 10.1145/3725273

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

ContextPilot: Fast Long-Context Inference via Context Reuse

cs.LG · 2025-11-05 · unverdicted · novelty 6.0

ContextPilot reduces LLM prefill latency by up to 3x via context indexing, ordering, de-duplication, and succinct annotations that maximize KV-cache reuse while preserving or improving reasoning quality.

AlignedServe: Orchestrating Prefix-aware Batching to Build a High-throughput and Computing-efficient LLM Serving System

cs.DC · 2026-05-22 · unverdicted · novelty 5.0

AlignedServe uses prefix-aware batching, large CPU in-flight request pools, batch scheduling, and GPU-to-GPU KV prefetching to raise decoding throughput up to 1.98x and cut latency up to 7.4x versus prior serving systems.

CacheWeaver: Cache-Aware Evidence Ordering for Efficient Grounded RAG Inference

cs.CL · 2026-06-18 · unverdicted · novelty 4.0

CacheWeaver is a lightweight scheduling layer that orders evidence to exploit prefix caching, reducing median TTFT by 20-33% across vLLM setups while preserving answer quality.

MemForest: An Efficient Agent Memory System with Hierarchical Temporal Indexing

cs.DB · 2026-05-16 · unverdicted · novelty 4.0

MemForest reformulates agent memory as a temporal data management problem using a hierarchical index (MemTree) for parallel construction and localized updates, reporting 79.8% accuracy and 6x throughput on LongMemEval-S and LoCoMo benchmarks.

citing papers explorer

Showing 3 of 3 citing papers after filters.

AlignedServe: Orchestrating Prefix-aware Batching to Build a High-throughput and Computing-efficient LLM Serving System cs.DC · 2026-05-22 · unverdicted · none · ref 1
AlignedServe uses prefix-aware batching, large CPU in-flight request pools, batch scheduling, and GPU-to-GPU KV prefetching to raise decoding throughput up to 1.98x and cut latency up to 7.4x versus prior serving systems.
CacheWeaver: Cache-Aware Evidence Ordering for Efficient Grounded RAG Inference cs.CL · 2026-06-18 · unverdicted · none · ref 18
CacheWeaver is a lightweight scheduling layer that orders evidence to exploit prefix caching, reducing median TTFT by 20-33% across vLLM setups while preserving answer quality.
MemForest: An Efficient Agent Memory System with Hierarchical Temporal Indexing cs.DB · 2026-05-16 · unverdicted · none · ref 2
MemForest reformulates agent memory as a temporal data management problem using a hierarchical index (MemTree) for parallel construction and localized updates, reporting 79.8% accuracy and 6x throughput on LongMemEval-S and LoCoMo benchmarks.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer