QCFuse achieves full-prefill quality in RAG with 1.7x average prefill speedup over full prefill and 1.5x over ProphetKV via compressed query-aware cache fusion.
Musique: Multihop questions via single-hop question composition
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 8roles
dataset 1polarities
use dataset 1representative citing papers
Agentic search narrows the gap between dense RAG and GraphRAG but does not remove GraphRAG's advantage on complex multi-hop reasoning.
BRANE maps queries to optimal retrieval pipeline configurations using LLM-derived features and per-configuration correctness predictors, improving the cost-quality Pareto frontier on three benchmarks.
PromptSuite is a modular, extensible, task-agnostic framework for automatically generating diverse prompt variations to support robust multi-prompt LLM evaluation.
KbSD uses a same-size hint-augmented teacher and quadrant-adaptive KL objectives to deliver dense supervision for calibrated behavior across knowledge states in agentic search.
DCO is an inference-time intervention that decomposes attention head outputs orthogonally to a dynamic context anchor and suppresses outlier components via Z-score to improve contextual faithfulness in Llama models.
MeMo encodes new knowledge into a separate memory model that integrates with frozen LLMs, showing strong performance on QA benchmarks while avoiding catastrophic forgetting and working without access to model weights.
A graph-augmented RAG system with vector and graph query tools halves hallucinations and raises factual correctness scores on the MoNaCo complex QA benchmark.
citing papers explorer
-
QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG Serving
QCFuse achieves full-prefill quality in RAG with 1.7x average prefill speedup over full prefill and 1.5x over ProphetKV via compressed query-aware cache fusion.
-
Natural Language Query to Configuration for Retrieval Agents
BRANE maps queries to optimal retrieval pipeline configurations using LLM-derived features and per-configuration correctness predictors, improving the cost-quality Pareto frontier on three benchmarks.