NOVA: NOise-aware Verbal Confidence CAlibration for Robust Large Language Models in RAG Systems

· 2026 · cs.CL · arXiv 2601.11004

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Accurately assessing model confidence is essential for deploying large language models (LLMs) in mission-critical factual domains. While retrieval-augmented generation (RAG) is widely adopted to improve grounding, confidence calibration in RAG settings remains poorly understood. We conduct a systematic study across four benchmarks, revealing that LLMs exhibit poor calibration performance especially when noisy contexts are retrieved. Specifically, contradictory or irrelevant evidence tends to exacerbate the model's overconfidence issue. To address this, we propose NOVA Rules (NOise-Aware Verbal Confidence CAlibration Rules) to provide a principled foundation for resolving overconfidence under noise. We further design NOVA, a noise-aware calibration framework that synthesizes supervision from ~2K HotpotQA examples guided by these rules. By performing supervised fine-tuning (SFT) with this data, NOVA equips models with intrinsic noise awareness without relying on stronger teacher models. Empirical results show that NOVA yields substantial gains, improving ECE scores by 10.9% in-domain and 8.0% out-of-domain. By bridging the gap between retrieval noise and verbal calibration, NOVA paves the way for both accurate and epistemically reliable LLMs.

representative citing papers

Multi-Turn Agentic Scientific Literature Search via Workflow Induction

cs.CL · 2026-07-01 · unverdicted · novelty 6.0 · 2 refs

PaperPilot induces executable DAG workflows for multi-turn literature search and trains via imitation plus preference optimization, raising Hit@5 from 58.0 to 77.0 over a baseline agent.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Multi-Turn Agentic Scientific Literature Search via Workflow Induction cs.CL · 2026-07-01 · unverdicted · none · ref 4 · 2 links · internal anchor
PaperPilot induces executable DAG workflows for multi-turn literature search and trains via imitation plus preference optimization, raising Hit@5 from 58.0 to 77.0 over a baseline agent.

NOVA: NOise-aware Verbal Confidence CAlibration for Robust Large Language Models in RAG Systems

fields

years

verdicts

representative citing papers

citing papers explorer