pith. machine review for the scientific record. sign in

arxiv: 2511.09803 · v2 · submitted 2025-11-12 · 💻 cs.CL

Recognition: unknown

Retrieval as a Decision: Training-Free Adaptive Gating for Efficient RAG

Authors on Pith no claims yet
classification 💻 cs.CL
keywords retrievaltargdraftlatencyonlyadaptiveentropygate
0
0 comments X
read the original abstract

Retrieval-Augmented Generation (RAG) improves factuality but retrieving for every query often hurts quality while inflating tokens and latency. We propose Training-free Adaptive Retrieval Gating (TARG), a single-shot policy that decides when to retrieve using only a short, no-context draft from the base model. From the draft's prefix logits, TARG computes lightweight uncertainty scores-mean token entropy, a margin signal derived from the top-1/top-2 logit gap via a monotone link, or small-N variance across a handful of stochastic prefixes-and triggers retrieval only when the score exceeds a threshold. The gate is model-agnostic, adds only tens to hundreds of draft tokens, and requires no additional training or auxiliary heads. On five QA benchmarks spanning short-answer (NQ-Open, TriviaQA, PopQA), multi-hop (MuSiQue), and long-form (ASQA) tasks, TARG consistently pushes the accuracy-efficiency frontier: compared with Alway-RAG, TARG matches or improves EM/F1 while reducing retrieval by 70-90% and cutting end-to-end latency, and it remains close to Never-RAG in overhead. A central empirical finding is that under modern instruction-tuned LLMs the margin signal is a robust default (entropy compresses as backbones sharpen), with small-N variance offering a conservative, budget-first alternative. We provide ablations over gate type and prefix length and use a $\Delta$-latency view to make budget trade-offs explicit.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise Ranking

    cs.IR 2026-04 unverdicted novelty 5.0

    AdaRankLLM shows adaptive listwise reranking outperforms fixed-depth retrieval for most LLMs by acting as a noise filter for weak models and an efficiency optimizer for strong ones, with lower context use.