Trust or Abstain? A Self-Aware RAG Approach

Bangji Yang; Dimitris N. Metaxas; Jiajun Fan; Kai Mei; Minghao Guo; Wujiang Xu; Xi Zhu; Ziqi Wang

arxiv: 2605.18792 · v1 · pith:J3KVHZTAnew · submitted 2026-05-11 · 💻 cs.IR · cs.CL

Trust or Abstain? A Self-Aware RAG Approach

Xi Zhu , Ziqi Wang , Kai Mei , Wujiang Xu , Minghao Guo , Bangji Yang , Jiajun Fan , Dimitris N. Metaxas This is my paper

Pith reviewed 2026-05-20 22:58 UTC · model grok-4.3

classification 💻 cs.IR cs.CL

keywords retrieval-augmented generationknowledge conflictsself-awarenessabstentionparametric knowledgecontextual knowledgebelief estimationLLM reliability

0 comments

The pith

SABER lets RAG systems judge whether to trust their own knowledge, the retrieved context, or abstain when the two conflict.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that retrieval-augmented generation becomes more reliable when the language model can explicitly recognize the limits of its internal knowledge and the retrieved evidence. It builds a large benchmark of query-context pairs drawn from five conflict datasets and evaluates how well different LLMs handle parametric versus contextual knowledge paths. SABER then combines multi-trace reasoning outputs with two lightweight predictors to estimate reliability beliefs and applies a simple four-way decision rule. A sympathetic reader would care because unresolved knowledge conflicts are a main source of incorrect or unfaithful answers in current RAG deployments, and the method achieves its gains without any fine-tuning of the underlying LLM.

Core claim

SABER is a Self-Aware Belief Estimator for RAG that requires no LLM fine-tuning. It combines a self-prior with PK-side and CK-side conditional reasoning representations from multi-trace inference, then estimates reliability beliefs with two lightweight predictors to drive a 4-cell decision over trust PK, trust CK, trust either, or abstain. Across four LLM backbones, SABER improves end-to-end accuracy and conflict-specific faithfulness over ten inference-time and fine-tuning baselines, with the largest gains on conflict-heavy datasets. Under abstention, SABER's risk-coverage curve Pareto-dominates every prompt-based abstainer.

What carries the argument

The four-cell decision rule driven by reliability beliefs that are estimated from multi-trace conditional reasoning representations using two lightweight predictors.

If this is right

End-to-end accuracy rises most on datasets where knowledge conflicts are frequent.
Conflict-specific faithfulness improves over both inference-time and fine-tuning baselines.
Abstention provides a tunable coverage-risk trade-off that Pareto-dominates prompt-based abstainers.
The same gains appear across four different LLM backbones without retraining them.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same lightweight-predictor idea could be tested on non-RAG tasks where an LLM must decide whether to answer from memory or decline.
If the predictors remain accurate when the underlying LLM is updated, the method could reduce repeated fine-tuning costs in production RAG pipelines.
The benchmark construction process itself offers a reusable way to measure self-awareness on any new set of conflict datasets.

Load-bearing premise

Multi-trace inference must produce conditional reasoning representations that the two lightweight predictors can turn into accurate reliability beliefs without any fine-tuning of the base LLM.

What would settle it

A direct test in which the reliability scores output by the predictors show little or no correlation with actual answer correctness on held-out conflict instances would show that the four-cell rule cannot deliver the reported accuracy and faithfulness gains.

Figures

Figures reproduced from arXiv: 2605.18792 by Bangji Yang, Dimitris N. Metaxas, Jiajun Fan, Kai Mei, Minghao Guo, Wujiang Xu, Xi Zhu, Ziqi Wang.

**Figure 1.** Figure 1: A student facing a +1 / −1 / 0 exam rule, where a correct answer earns +1, a wrong one −1, and abstention earns 0. The student’s memory corresponds to PK, while the potentially flawed reference book corresponds to retrieved CK. The example highlights two forms of self-awareness, namely knowledge-boundary awareness, which reveals when internal knowledge is insufficient, and reasoning-reliability awareness, … view at source ↗

**Figure 2.** Figure 2: SABER pipeline. The frozen LLM produces a query-only self-prior and per-side conditional [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Risk-coverage curves. Each panel corresponds to one LLM backbone, and lower-left [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Ablation study with four variants across four LLMs over the whole benchmark. Each panel [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Layer sensitivity of SABER on the Llama family. Each panel shows end-to-end answer [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: 4-cell reliability distribution of the constructed benchmark. Each stacked bar shows, for [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Layer sensitivity of SABER on the Qwen family (counterpart of Figure [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

read the original abstract

Retrieval-augmented generation (RAG) improves large language models (LLMs) by incorporating external evidence, but it also introduces knowledge conflicts when retrieved contextual knowledge (CK) and parametric knowledge (PK) disagree or are both unreliable. Existing approaches mainly coordinate which source to use, without explicitly asking whether each answer path is correct. We argue that faithful RAG requires LLM self-awareness, namely the ability to recognize the limits of its own knowledge and reasoning. To ground this problem, we construct a model-specific, ground-truth-aligned knowledge-conflict benchmark by evaluating LLM backbones on PK-only and CK-conditioned answer paths over approximately 69K query-context instances per backbone, drawn from five conflict-QA datasets. We then introduce SABER, a Self-Aware Belief Estimator for RAG that requires no LLM fine-tuning. SABER combines a self-prior with PK-side and CK-side conditional reasoning representations from multi-trace inference, then estimates reliability beliefs with two lightweight predictors to drive a 4-cell decision over trust PK, trust CK, trust either, or abstain. Across four LLM backbones, SABER improves end-to-end accuracy and conflict-specific faithfulness over ten inference-time and fine-tuning baselines, with the largest gains on conflict-heavy datasets. Under abstention, SABER's risk-coverage curve Pareto-dominates every prompt-based abstainer, providing a tunable balance between coverage and answer risk. Our code is available at https://github.com/xizhu1022/SABER.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SABER adds a no-fine-tune 4-cell trust/abstain rule to RAG via multi-trace predictors and a new 69K benchmark, but the core mapping from traces to reliability beliefs still lacks direct checks.

read the letter

SABER's main move is to treat knowledge conflicts in RAG as a self-awareness problem. It runs multi-trace inference on PK-only and CK-conditioned paths, combines those representations with a self-prior, and lets two lightweight predictors output one of four decisions: trust PK, trust CK, trust either, or abstain. They support this with a model-specific benchmark of roughly 69K ground-truth-aligned instances per backbone drawn from five conflict-QA datasets.

Referee Report

2 major / 2 minor

Summary. The paper introduces SABER, a Self-Aware Belief Estimator for RAG that performs multi-trace inference on parametric-knowledge-only and contextual-knowledge-conditioned paths, combines these representations with a self-prior, and feeds them to two lightweight predictors to produce reliability beliefs. These beliefs drive a 4-cell decision rule (trust PK, trust CK, trust either, or abstain) without any LLM fine-tuning. The authors construct a model-specific benchmark of approximately 69K query-context instances per backbone from five conflict-QA datasets, evaluate on four LLM backbones, and claim higher end-to-end accuracy and conflict-specific faithfulness than ten inference-time and fine-tuning baselines, with the largest gains on conflict-heavy data; under abstention, SABER's risk-coverage curve Pareto-dominates prompt-based abstainers.

Significance. If the empirical claims hold, the work offers a practical route to more reliable RAG by adding explicit self-awareness at inference time only. The construction of a large, model-specific, ground-truth-aligned conflict benchmark and the release of code are clear strengths that support reproducibility and further research on tunable abstention.

major comments (2)

[Section 4 and Section 5] Section 4 (Method) and Section 5 (Experiments): the central claim that the two lightweight predictors produce accurate reliability beliefs from multi-trace representations rests on the assumption that these representations capture genuine conflict signals rather than surface artifacts. No calibration plots, precision-recall curves, or feature-importance analysis for the predictors are reported, even though the method explicitly avoids LLM fine-tuning and places the entire burden on this mapping. This omission is load-bearing for attributing the reported accuracy and faithfulness gains to self-awareness.
[Section 5.3] Section 5.3 (Results): while the paper states that SABER improves accuracy and faithfulness across four backbones and Pareto-dominates risk-coverage curves, the abstract and main results lack error bars, statistical significance tests, or detailed descriptions of how the ten baselines were implemented and tuned. Without these, it is difficult to verify that the observed gains are robust and not artifacts of implementation choices.

minor comments (2)

[Abstract] Abstract: the summary of results mentions accuracy and faithfulness gains but supplies no concrete numbers, dataset breakdowns, or confidence intervals, which reduces immediate clarity.
[Section 3] Notation in Section 3: the precise feature vectors passed to the PK-side and CK-side predictors should be defined more explicitly, including how the self-prior is combined with the conditional representations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the presentation of our empirical results. We address each major comment below, indicating the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Section 4 and Section 5] Section 4 (Method) and Section 5 (Experiments): the central claim that the two lightweight predictors produce accurate reliability beliefs from multi-trace representations rests on the assumption that these representations capture genuine conflict signals rather than surface artifacts. No calibration plots, precision-recall curves, or feature-importance analysis for the predictors are reported, even though the method explicitly avoids LLM fine-tuning and places the entire burden on this mapping. This omission is load-bearing for attributing the reported accuracy and faithfulness gains to self-awareness.

Authors: We agree that providing additional diagnostic analyses would better substantiate that the multi-trace PK and CK representations capture genuine conflict signals. In the revised version, we will add calibration plots for the reliability belief predictors, precision-recall curves evaluating their performance on held-out data, and a feature importance analysis (e.g., via permutation importance or coefficient inspection for the lightweight models) to demonstrate that the predictors rely on conflict-relevant features rather than superficial patterns. These additions will directly address the attribution of gains to the self-awareness mechanism. revision: yes
Referee: [Section 5.3] Section 5.3 (Results): while the paper states that SABER improves accuracy and faithfulness across four backbones and Pareto-dominates risk-coverage curves, the abstract and main results lack error bars, statistical significance tests, or detailed descriptions of how the ten baselines were implemented and tuned. Without these, it is difficult to verify that the observed gains are robust and not artifacts of implementation choices.

Authors: We acknowledge the importance of statistical rigor and detailed baseline descriptions for verifying the robustness of our results. In the revision, we will include error bars (standard deviation across multiple runs or seeds) in the main results tables and figures. We will also report statistical significance tests, such as paired t-tests or Wilcoxon signed-rank tests, for the accuracy and faithfulness improvements over baselines. Furthermore, we will expand Section 5.3 and the appendix with detailed descriptions of each baseline's implementation, including hyperparameter tuning procedures, prompt templates, and any model-specific adaptations to ensure full reproducibility. revision: yes

Circularity Check

0 steps flagged

No significant circularity; SABER uses trained lightweight predictors on a constructed benchmark without reducing claims to self-definition or tautological fits

full rationale

The paper constructs a model-specific benchmark (~69K instances per backbone) by running LLM evaluations on PK-only and CK-conditioned paths to obtain ground-truth correctness labels. It then feeds multi-trace inference representations into two lightweight predictors (combined with a self-prior) to produce reliability beliefs that drive the 4-cell trust/abstain decision. This is a standard supervised learning pipeline rather than any self-definitional loop, fitted input renamed as prediction, or load-bearing self-citation. The reported end-to-end accuracy gains and risk-coverage Pareto dominance are measured against external baselines on the same benchmark data; the method introduces independent components (multi-trace representations and predictors) whose mapping is not equivalent to the inputs by construction. No equations or ansatzes reduce the central claim to prior fitted quantities or author-specific uniqueness theorems. The derivation remains self-contained as an empirical RAG technique.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the domain assumption that multi-trace inference yields usable conditional reasoning representations for belief estimation; no free parameters, invented entities, or additional axioms are explicitly described.

axioms (1)

domain assumption Multi-trace inference on PK-only and CK-conditioned paths produces representations that support accurate reliability belief estimation via lightweight predictors
Invoked to justify the construction of SABER without LLM fine-tuning.

pith-pipeline@v0.9.0 · 5821 in / 1431 out tokens · 92228 ms · 2026-05-20T22:58:02.243252+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SABER combines a self-prior with PK-side and CK-side conditional reasoning representations from multi-trace inference, then estimates reliability beliefs with two lightweight predictors to drive a 4-cell decision over trust PK, trust CK, trust either, or abstain.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We sample K independent reasoning traces per side and mean-pool their per-trace vectors into a robust per-side conditional state

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 7 internal anchors

[1]

Self-rag: Learning to retrieve, generate, and critique through self-reflection

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. Self-rag: Learning to retrieve, generate, and critique through self-reflection. InInternational Conference on Learning Representations, 2024

work page 2024
[2]

The Internal State of an LLM Knows When It's Lying

Amos Azaria and Tom Mitchell. The internal state of an llm knows when it’s lying.arXiv preprint arXiv:2304.13734, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Context-DPO: Aligning language models for context-faithfulness.arXiv preprint arXiv:2412.15280, 2024

Baolong Bi, Shenghua Huang, Yiwei Wang, Tianchi Yang, Zhongyu Zhang, Haizhou Huang, Lijie Mei, Junfeng Fang, Zehao Li, Furu Wei, et al. Context-DPO: Aligning language models for context-faithfulness.arXiv preprint arXiv:2412.15280, 2024

work page arXiv 2024
[4]

Discovering Latent Knowledge in Language Models Without Supervision

Collin Burns, Haotian Ye, Dan Klein, and Jacob Steinhardt. Discovering latent knowledge in language models without supervision.arXiv preprint arXiv:2212.03827, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[5]

INSIDE: LLMs’ internal states retain the power of hallucination detection

Chao Chen, Kai Liu, Ze Chen, Yi Gu, Yue Wu, Mingyuan Tao, Zhihang Fu, and Jieping Ye. INSIDE: LLMs’ internal states retain the power of hallucination detection. InInternational Conference on Learning Representations, 2024

work page 2024
[6]

Suchanek, and Gaël Varoquaux

Lihu Chen, Gerard de Melo, Fabian M. Suchanek, and Gaël Varoquaux. Query-level uncertainty in large language models. InInternational Conference on Learning Representations, 2026

work page 2026
[7]

Beyond Black-Box Interventions: Latent Probing for Faithful Retrieval-Augmented Generation

Linfeng Gao, Qinggang Zhang, Baolong Bi, Bo Zeng, Zheng Yuan, Zerui Chen, Zhimin Wei, Shenghua Liu, Linlong Xu, Longyue Wang, Weihua Luo, and Jinsong Su. Beyond black- box interventions: Latent probing for faithful retrieval-augmented generation.arXiv preprint arXiv:2510.12460, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Selective classification for deep neural networks

Yonatan Geifman and Ran El-Yaniv. Selective classification for deep neural networks. In Advances in Neural Information Processing Systems, 2017

work page 2017
[9]

Can llms predict their own failures? self-awareness via internal circuits.arXiv preprint arXiv:2512.20578, 2026

Amirhosein Ghasemabadi and Di Niu. Can llms predict their own failures? self-awareness via internal circuits.arXiv preprint arXiv:2512.20578, 2026

work page arXiv 2026
[10]

DeepSieve: Information sieving via LLM-as-a-knowledge-router

Minghao Guo, Qingcheng Zeng, Xujiang Zhao, Yanchi Liu, Wenchao Yu, Mengnan Du, Haifeng Chen, and Wei Cheng. DeepSieve: Information sieving via LLM-as-a-knowledge-router. In Findings of the Association for Computational Linguistics: EACL 2026, 2026

work page 2026
[11]

WikiContradict: A benchmark for evaluating LLMs on real-world knowledge conflicts from Wikipedia

Yufang Hou, Alessandra Pascale, Javier Carnerero-Cano, Tigran Tchrakian, Radu Marinescu, Elizabeth Daly, Inkit Padhi, and Prasanna Sattigeri. WikiContradict: A benchmark for evaluating LLMs on real-world knowledge conflicts from Wikipedia. InAdvances in Neural Information Processing Systems, 2024. Datasets and Benchmarks Track

work page 2024
[12]

To trust or not to trust? enhancing large language models’ situated faithfulness to external contexts

Yukun Huang, Sanxing Chen, Hongyi Cai, and Bhuwan Dhingra. To trust or not to trust? enhancing large language models’ situated faithfulness to external contexts. InInternational Conference on Learning Representations, 2025

work page 2025
[13]

Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong C. Park. Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics, 2024

work page 2024
[14]

Massive values in self-attention modules are the key to contextual knowledge understanding

Mingyu Jin, Kai Mei, Wujiang Xu, Mingjie Sun, Ruixiang Tang, Mengnan Du, Zirui Liu, and Yongfeng Zhang. Massive values in self-attention modules are the key to contextual knowledge understanding. InInternational Conference on Machine Learning, 2025

work page 2025
[15]

Language Models (Mostly) Know What They Know

Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, et al. Language models (mostly) know what they know.arXiv preprint arXiv:2207.05221, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[16]

Retrieval-augmented generation for knowledge-intensive nlp tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks. InAdvances in Neural Information Processing Systems, 2020. 10

work page 2020
[17]

Teaching Models to Express Their Uncertainty in Words

Stephanie Lin, Jacob Hilton, and Owain Evans. Teaching models to express their uncertainty in words.arXiv preprint arXiv:2205.14334, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[18]

Entity-based knowledge conflicts in question answering.arXiv preprint arXiv:2109.05052, 2021

Shayne Longpre, Kartik Perisetla, Anthony Chen, Nikhil Ramesh, Chris DuBois, and Sameer Singh. Entity-based knowledge conflicts in question answering.arXiv preprint arXiv:2109.05052, 2021

work page arXiv 2021
[19]

When not to trust language models: Investigating effectiveness of parametric and non-parametric memories

Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, and Hannaneh Ha- jishirzi. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023

work page 2023
[20]

R-WoM: Retrieval-augmented world model for computer-use agents.arXiv preprint arXiv:2510.11892, 2025

Kai Mei, Jiang Guo, Shuaichen Chang, Mingwen Dong, Dongkyu Lee, Xing Niu, and Jiarong Jiang. R-WoM: Retrieval-augmented world model for computer-use agents.arXiv preprint arXiv:2510.11892, 2025

work page arXiv 2025
[21]

ReProbe: Efficient Test-Time Scaling of Multi-Step Reasoning by Probing Internal States of Large Language Models

Jingwei Ni, Ekaterina Fadeeva, Tianyi Wu, Mubashara Akhtar, Jiaheng Zhang, Elliott Ash, Markus Leippold, Timothy Baldwin, See-Kiong Ng, Artem Shelmanov, and Mrinmaya Sachan. Reprobe: Efficient test-time scaling of multi-step reasoning by probing internal states of large language models.arXiv preprint arXiv:2511.06209, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[22]

Trusting your evidence: Hallucinate less with context-aware decoding.arXiv preprint arXiv:2305.14739, 2024

Weijia Shi, Xiaochuang Han, Mike Lewis, Yulia Tsvetkov, Luke Zettlemoyer, and Scott Yih. Trusting your evidence: Hallucinate less with context-aware decoding.arXiv preprint arXiv:2305.14739, 2024

work page arXiv 2024
[23]

Dragin: Dynamic retrieval augmented generation based on the information needs of large language models.arXiv preprint arXiv:2403.10081, 2024

Weihang Su, Yichen Tang, Qingyao Ai, Zhijing Wu, and Yiqun Liu. Dragin: Dynamic retrieval augmented generation based on the information needs of large language models.arXiv preprint arXiv:2403.10081, 2024

work page arXiv 2024
[24]

arXiv preprint

Zhaochen Su, Jun Zhang, Xiaoye Qu, Tong Zhu, Yanshu Li, Jiashuo Sun, Juntao Li, Min Zhang, and Yu Cheng. ConflictBank: A benchmark for evaluating knowledge conflicts in large language models.arXiv preprint arXiv:2408.12076, 2024

work page arXiv 2024
[25]

AdaCAD: Adaptively de- coding to balance conflicts between contextual and parametric knowledge.arXiv preprint arXiv:2409.07394, 2025

Han Wang, Akshat Shrivastava, Junyi Jessy Hu, Yash Lal, Manzil Zaheer, Mohammad Javad Hosseini, Sheng-Chieh Lin, Veselin Stoyanov, and Wen-tau Yih. AdaCAD: Adaptively de- coding to balance conflicts between contextual and parametric knowledge.arXiv preprint arXiv:2409.07394, 2025

work page arXiv 2025
[26]

Le, Ed H

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V . Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. InInternational Conference on Learning Representations, 2023

work page 2023
[27]

RAGRouter-Bench: A Dataset and Benchmark for Adaptive RAG Routing

Ziqi Wang, Xi Zhu, Shuhang Lin, Haochen Xue, Minghao Guo, and Yongfeng Zhang. RAGRouter-Bench: A dataset and benchmark for adaptive RAG routing.arXiv preprint arXiv:2602.00296, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[28]

Know your limits: A survey of abstention in large language models.arXiv preprint arXiv:2407.18418, 2025

Bingbing Wen, Jihan Yao, Shangbin Feng, Chenjun Xu, Yulia Tsvetkov, Bill Howe, and Lucy Lu Wang. Know your limits: A survey of abstention in large language models.arXiv preprint arXiv:2407.18418, 2025

work page arXiv 2025
[29]

Clasheval: Quantifying the tug-of-war between an llm’s internal prior and external evidence

Kevin Wu, Eric Wu, and James Zou. Clasheval: Quantifying the tug-of-war between an llm’s internal prior and external evidence. InAdvances in Neural Information Processing Systems,

work page
[30]

Datasets and Benchmarks Track

work page
[31]

Adaptive chameleon or stubborn sloth: Revealing the behavior of large language models in knowledge conflicts

Jian Xie, Kai Zhang, Jiangjie Chen, Renze Lou, and Yu Su. Adaptive chameleon or stubborn sloth: Revealing the behavior of large language models in knowledge conflicts. InInternational Conference on Learning Representations, 2024

work page 2024
[32]

Knowledge conflicts for LLMs: A survey

Rongwu Xu, Zehan Qi, Zhijiang Guo, Cunxiang Wang, Hongru Wang, Yue Zhang, and Wei Xu. Knowledge conflicts for LLMs: A survey. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024. 11

work page 2024
[33]

SeaKR: Self-aware knowledge retrieval for adaptive retrieval augmented generation

Zijun Yao, Weijian Qi, Liangming Pan, Shulin Cao, Linmei Hu, Weichuan Liu, Lei Hou, and Juanzi Li. SeaKR: Self-aware knowledge retrieval for adaptive retrieval augmented generation. arXiv preprint arXiv:2406.19215, 2025

work page arXiv 2025
[34]

Do large language models know what they don’t know? InFindings of the Association for Computational Linguistics: ACL 2023, 2023

Zhangyue Yin, Qiushi Sun, Qipeng Guo, Jiawen Wu, Xipeng Qiu, and Xuanjing Huang. Do large language models know what they don’t know? InFindings of the Association for Computational Linguistics: ACL 2023, 2023

work page 2023
[35]

Truth-aware context selection: Mitigating hal- lucinations of large language models being misled by untruthful contexts.arXiv preprint arXiv:2403.07556, 2024

Tian Yu, Shaolei Zhang, and Yang Feng. Truth-aware context selection: Mitigating hal- lucinations of large language models being misled by untruthful contexts.arXiv preprint arXiv:2403.07556, 2024

work page arXiv 2024
[36]

Fung, Qing Lian, Xingyao Wang, Yangyi Chen, Heng Ji, and Tong Zhang

Hanning Zhang, Shizhe Diao, Yong Lin, Yi R. Fung, Qing Lian, Xingyao Wang, Yangyi Chen, Heng Ji, and Tong Zhang. R-Tuning: Instructing large language models to say ‘I don’t know’. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics, 2024

work page 2024
[37]

Faithfulrag: Fact-level conflict modeling for context-faithful retrieval-augmented generation

Qinggang Zhang, Zhishang Xiang, Yilin Xiao, Le Wang, Junhui Li, Xinrun Wang, and Jinsong Su. Faithfulrag: Fact-level conflict modeling for context-faithful retrieval-augmented generation. arXiv preprint arXiv:2506.08938, 2025. 12 Appendix A Implementation Details A.1 Probe architecture and training The two side-specific reliability predictors fpred,PK and...

work page arXiv 2025

[1] [1]

Self-rag: Learning to retrieve, generate, and critique through self-reflection

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. Self-rag: Learning to retrieve, generate, and critique through self-reflection. InInternational Conference on Learning Representations, 2024

work page 2024

[2] [2]

The Internal State of an LLM Knows When It's Lying

Amos Azaria and Tom Mitchell. The internal state of an llm knows when it’s lying.arXiv preprint arXiv:2304.13734, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

Context-DPO: Aligning language models for context-faithfulness.arXiv preprint arXiv:2412.15280, 2024

Baolong Bi, Shenghua Huang, Yiwei Wang, Tianchi Yang, Zhongyu Zhang, Haizhou Huang, Lijie Mei, Junfeng Fang, Zehao Li, Furu Wei, et al. Context-DPO: Aligning language models for context-faithfulness.arXiv preprint arXiv:2412.15280, 2024

work page arXiv 2024

[4] [4]

Discovering Latent Knowledge in Language Models Without Supervision

Collin Burns, Haotian Ye, Dan Klein, and Jacob Steinhardt. Discovering latent knowledge in language models without supervision.arXiv preprint arXiv:2212.03827, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[5] [5]

INSIDE: LLMs’ internal states retain the power of hallucination detection

Chao Chen, Kai Liu, Ze Chen, Yi Gu, Yue Wu, Mingyuan Tao, Zhihang Fu, and Jieping Ye. INSIDE: LLMs’ internal states retain the power of hallucination detection. InInternational Conference on Learning Representations, 2024

work page 2024

[6] [6]

Suchanek, and Gaël Varoquaux

Lihu Chen, Gerard de Melo, Fabian M. Suchanek, and Gaël Varoquaux. Query-level uncertainty in large language models. InInternational Conference on Learning Representations, 2026

work page 2026

[7] [7]

Beyond Black-Box Interventions: Latent Probing for Faithful Retrieval-Augmented Generation

Linfeng Gao, Qinggang Zhang, Baolong Bi, Bo Zeng, Zheng Yuan, Zerui Chen, Zhimin Wei, Shenghua Liu, Linlong Xu, Longyue Wang, Weihua Luo, and Jinsong Su. Beyond black- box interventions: Latent probing for faithful retrieval-augmented generation.arXiv preprint arXiv:2510.12460, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

Selective classification for deep neural networks

Yonatan Geifman and Ran El-Yaniv. Selective classification for deep neural networks. In Advances in Neural Information Processing Systems, 2017

work page 2017

[9] [9]

Can llms predict their own failures? self-awareness via internal circuits.arXiv preprint arXiv:2512.20578, 2026

Amirhosein Ghasemabadi and Di Niu. Can llms predict their own failures? self-awareness via internal circuits.arXiv preprint arXiv:2512.20578, 2026

work page arXiv 2026

[10] [10]

DeepSieve: Information sieving via LLM-as-a-knowledge-router

Minghao Guo, Qingcheng Zeng, Xujiang Zhao, Yanchi Liu, Wenchao Yu, Mengnan Du, Haifeng Chen, and Wei Cheng. DeepSieve: Information sieving via LLM-as-a-knowledge-router. In Findings of the Association for Computational Linguistics: EACL 2026, 2026

work page 2026

[11] [11]

WikiContradict: A benchmark for evaluating LLMs on real-world knowledge conflicts from Wikipedia

Yufang Hou, Alessandra Pascale, Javier Carnerero-Cano, Tigran Tchrakian, Radu Marinescu, Elizabeth Daly, Inkit Padhi, and Prasanna Sattigeri. WikiContradict: A benchmark for evaluating LLMs on real-world knowledge conflicts from Wikipedia. InAdvances in Neural Information Processing Systems, 2024. Datasets and Benchmarks Track

work page 2024

[12] [12]

To trust or not to trust? enhancing large language models’ situated faithfulness to external contexts

Yukun Huang, Sanxing Chen, Hongyi Cai, and Bhuwan Dhingra. To trust or not to trust? enhancing large language models’ situated faithfulness to external contexts. InInternational Conference on Learning Representations, 2025

work page 2025

[13] [13]

Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong C. Park. Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics, 2024

work page 2024

[14] [14]

Massive values in self-attention modules are the key to contextual knowledge understanding

Mingyu Jin, Kai Mei, Wujiang Xu, Mingjie Sun, Ruixiang Tang, Mengnan Du, Zirui Liu, and Yongfeng Zhang. Massive values in self-attention modules are the key to contextual knowledge understanding. InInternational Conference on Machine Learning, 2025

work page 2025

[15] [15]

Language Models (Mostly) Know What They Know

Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, et al. Language models (mostly) know what they know.arXiv preprint arXiv:2207.05221, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[16] [16]

Retrieval-augmented generation for knowledge-intensive nlp tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks. InAdvances in Neural Information Processing Systems, 2020. 10

work page 2020

[17] [17]

Teaching Models to Express Their Uncertainty in Words

Stephanie Lin, Jacob Hilton, and Owain Evans. Teaching models to express their uncertainty in words.arXiv preprint arXiv:2205.14334, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[18] [18]

Entity-based knowledge conflicts in question answering.arXiv preprint arXiv:2109.05052, 2021

Shayne Longpre, Kartik Perisetla, Anthony Chen, Nikhil Ramesh, Chris DuBois, and Sameer Singh. Entity-based knowledge conflicts in question answering.arXiv preprint arXiv:2109.05052, 2021

work page arXiv 2021

[19] [19]

When not to trust language models: Investigating effectiveness of parametric and non-parametric memories

Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, and Hannaneh Ha- jishirzi. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023

work page 2023

[20] [20]

R-WoM: Retrieval-augmented world model for computer-use agents.arXiv preprint arXiv:2510.11892, 2025

Kai Mei, Jiang Guo, Shuaichen Chang, Mingwen Dong, Dongkyu Lee, Xing Niu, and Jiarong Jiang. R-WoM: Retrieval-augmented world model for computer-use agents.arXiv preprint arXiv:2510.11892, 2025

work page arXiv 2025

[21] [21]

ReProbe: Efficient Test-Time Scaling of Multi-Step Reasoning by Probing Internal States of Large Language Models

Jingwei Ni, Ekaterina Fadeeva, Tianyi Wu, Mubashara Akhtar, Jiaheng Zhang, Elliott Ash, Markus Leippold, Timothy Baldwin, See-Kiong Ng, Artem Shelmanov, and Mrinmaya Sachan. Reprobe: Efficient test-time scaling of multi-step reasoning by probing internal states of large language models.arXiv preprint arXiv:2511.06209, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[22] [22]

Trusting your evidence: Hallucinate less with context-aware decoding.arXiv preprint arXiv:2305.14739, 2024

Weijia Shi, Xiaochuang Han, Mike Lewis, Yulia Tsvetkov, Luke Zettlemoyer, and Scott Yih. Trusting your evidence: Hallucinate less with context-aware decoding.arXiv preprint arXiv:2305.14739, 2024

work page arXiv 2024

[23] [23]

Dragin: Dynamic retrieval augmented generation based on the information needs of large language models.arXiv preprint arXiv:2403.10081, 2024

Weihang Su, Yichen Tang, Qingyao Ai, Zhijing Wu, and Yiqun Liu. Dragin: Dynamic retrieval augmented generation based on the information needs of large language models.arXiv preprint arXiv:2403.10081, 2024

work page arXiv 2024

[24] [24]

arXiv preprint

Zhaochen Su, Jun Zhang, Xiaoye Qu, Tong Zhu, Yanshu Li, Jiashuo Sun, Juntao Li, Min Zhang, and Yu Cheng. ConflictBank: A benchmark for evaluating knowledge conflicts in large language models.arXiv preprint arXiv:2408.12076, 2024

work page arXiv 2024

[25] [25]

AdaCAD: Adaptively de- coding to balance conflicts between contextual and parametric knowledge.arXiv preprint arXiv:2409.07394, 2025

Han Wang, Akshat Shrivastava, Junyi Jessy Hu, Yash Lal, Manzil Zaheer, Mohammad Javad Hosseini, Sheng-Chieh Lin, Veselin Stoyanov, and Wen-tau Yih. AdaCAD: Adaptively de- coding to balance conflicts between contextual and parametric knowledge.arXiv preprint arXiv:2409.07394, 2025

work page arXiv 2025

[26] [26]

Le, Ed H

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V . Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. InInternational Conference on Learning Representations, 2023

work page 2023

[27] [27]

RAGRouter-Bench: A Dataset and Benchmark for Adaptive RAG Routing

Ziqi Wang, Xi Zhu, Shuhang Lin, Haochen Xue, Minghao Guo, and Yongfeng Zhang. RAGRouter-Bench: A dataset and benchmark for adaptive RAG routing.arXiv preprint arXiv:2602.00296, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[28] [28]

Know your limits: A survey of abstention in large language models.arXiv preprint arXiv:2407.18418, 2025

Bingbing Wen, Jihan Yao, Shangbin Feng, Chenjun Xu, Yulia Tsvetkov, Bill Howe, and Lucy Lu Wang. Know your limits: A survey of abstention in large language models.arXiv preprint arXiv:2407.18418, 2025

work page arXiv 2025

[29] [29]

Clasheval: Quantifying the tug-of-war between an llm’s internal prior and external evidence

Kevin Wu, Eric Wu, and James Zou. Clasheval: Quantifying the tug-of-war between an llm’s internal prior and external evidence. InAdvances in Neural Information Processing Systems,

work page

[30] [30]

Datasets and Benchmarks Track

work page

[31] [31]

Adaptive chameleon or stubborn sloth: Revealing the behavior of large language models in knowledge conflicts

Jian Xie, Kai Zhang, Jiangjie Chen, Renze Lou, and Yu Su. Adaptive chameleon or stubborn sloth: Revealing the behavior of large language models in knowledge conflicts. InInternational Conference on Learning Representations, 2024

work page 2024

[32] [32]

Knowledge conflicts for LLMs: A survey

Rongwu Xu, Zehan Qi, Zhijiang Guo, Cunxiang Wang, Hongru Wang, Yue Zhang, and Wei Xu. Knowledge conflicts for LLMs: A survey. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024. 11

work page 2024

[33] [33]

SeaKR: Self-aware knowledge retrieval for adaptive retrieval augmented generation

Zijun Yao, Weijian Qi, Liangming Pan, Shulin Cao, Linmei Hu, Weichuan Liu, Lei Hou, and Juanzi Li. SeaKR: Self-aware knowledge retrieval for adaptive retrieval augmented generation. arXiv preprint arXiv:2406.19215, 2025

work page arXiv 2025

[34] [34]

Do large language models know what they don’t know? InFindings of the Association for Computational Linguistics: ACL 2023, 2023

Zhangyue Yin, Qiushi Sun, Qipeng Guo, Jiawen Wu, Xipeng Qiu, and Xuanjing Huang. Do large language models know what they don’t know? InFindings of the Association for Computational Linguistics: ACL 2023, 2023

work page 2023

[35] [35]

Truth-aware context selection: Mitigating hal- lucinations of large language models being misled by untruthful contexts.arXiv preprint arXiv:2403.07556, 2024

Tian Yu, Shaolei Zhang, and Yang Feng. Truth-aware context selection: Mitigating hal- lucinations of large language models being misled by untruthful contexts.arXiv preprint arXiv:2403.07556, 2024

work page arXiv 2024

[36] [36]

Fung, Qing Lian, Xingyao Wang, Yangyi Chen, Heng Ji, and Tong Zhang

Hanning Zhang, Shizhe Diao, Yong Lin, Yi R. Fung, Qing Lian, Xingyao Wang, Yangyi Chen, Heng Ji, and Tong Zhang. R-Tuning: Instructing large language models to say ‘I don’t know’. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics, 2024

work page 2024

[37] [37]

Faithfulrag: Fact-level conflict modeling for context-faithful retrieval-augmented generation

Qinggang Zhang, Zhishang Xiang, Yilin Xiao, Le Wang, Junhui Li, Xinrun Wang, and Jinsong Su. Faithfulrag: Fact-level conflict modeling for context-faithful retrieval-augmented generation. arXiv preprint arXiv:2506.08938, 2025. 12 Appendix A Implementation Details A.1 Probe architecture and training The two side-specific reliability predictors fpred,PK and...

work page arXiv 2025