BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation

· 2026 · cs.CL · arXiv 2605.20084

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Large language models (LLMs) can enhance factuality via retrieval-augmented generation (RAG), but applying RAG to every query is unnecessary when the model-only answer is reliable. This motivates cascaded RAG: each query is first handled by an LLM-only branch, escalated to a RAG fallback only if the primary branch is uncertain, and abstained from when neither branch is sufficiently trustworthy. However, calibrating such cascades stage by stage may be conservative, since the final utility depends on joint uncertainty thresholding of LLM-only and RAG. In this work, we develop BalanceRAG to certify threshold pairs at a target risk level. Given uncertainty scores from the two branches, BalanceRAG frames each threshold pair as an operating point on a two-dimensional lattice and identifies safe operating points using sequential graphical testing. This enables risk-adaptive threshold calibration, controlling the system-level error rate among accepted points, while retaining more examples. Furthermore, BalanceRAG extends to multi-risk calibration, allowing retrieval usage to be bounded together with the selection-conditioned risk. Experiments on three open-domain question answering (QA) benchmarks across multiple LLM backbones demonstrate that BalanceRAG meets prescribed risk levels, preserves higher coverage and more accepted correct examples, and reduces unnecessary retrieval calls compared with always-on RAG.

representative citing papers

MiRD: Reliable Set-Valued Prediction for Open-Ended Question Answering via Miscoverage Risk Decomposition

cs.CL · 2026-05-25 · unverdicted · novelty 7.0

MiRD decomposes overall miscoverage into sampling and conditional selection risks for conformal set-valued prediction in open-ended QA, bounding each while using the full calibration set.

citing papers explorer

Showing 1 of 1 citing paper.

MiRD: Reliable Set-Valued Prediction for Open-Ended Question Answering via Miscoverage Risk Decomposition cs.CL · 2026-05-25 · unverdicted · none · ref 2 · internal anchor
MiRD decomposes overall miscoverage into sampling and conditional selection risks for conformal set-valued prediction in open-ended QA, bounding each while using the full calibration set.

BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation

fields

years

verdicts

representative citing papers

citing papers explorer