SHIFT: Gate-Modulated Activation Steering for Knowledge Conflict Mitigation in Retrieval-Augmented Generation

Ge Yu; Huiyuan Xie; Maosong Sun; Pengcheng Huang; Ruochang Li; Yu Gu; Yukun Yan; Zhenghao Liu

arxiv: 2606.27786 · v1 · pith:FFSZYADRnew · submitted 2026-06-26 · 💻 cs.CL · cs.AI

SHIFT: Gate-Modulated Activation Steering for Knowledge Conflict Mitigation in Retrieval-Augmented Generation

Ruochang Li , Pengcheng Huang , Zhenghao Liu , Yukun Yan , Huiyuan Xie , Yu Gu , Ge Yu , Maosong Sun This is my paper

Pith reviewed 2026-06-29 04:35 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords retrieval-augmented generationknowledge conflictsgate modulationactivation steeringlarge language modelsRAGparameter-efficient adaptation

0 comments

The pith

A lightweight gate module lets LLMs resolve knowledge conflicts in RAG by modulating activations instead of editing neurons.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper targets conflicts between retrieved external knowledge and an LLM's stored parametric knowledge in retrieval-augmented generation. Earlier neuron-editing methods risk unwanted side effects on other model behaviors because neurons often serve multiple roles. SHIFT replaces direct edits with a small learnable gate module that modulates internal activations during generation. The module trains on fewer than 0.01 percent of parameters while the backbone stays frozen, allowing adaptive use of either knowledge source. Experiments across six datasets show gains over competing baselines.

Core claim

SHIFT reformulates neuron-level modification as learnable gate modulation, equipping LLMs with a lightweight gate module that optimizes fewer than 0.01% trainable parameters while keeping the backbone model frozen. During generation, the gate module adjusts the model's internal representations to adaptively leverage contextual and parametric knowledge.

What carries the argument

The learnable gate module that performs activation steering by modulating internal representations to balance knowledge sources.

If this is right

Knowledge conflicts in RAG can be mitigated without compromising the LLM's general capabilities.
Only a very small fraction of parameters needs training compared with full model updates or neuron edits.
The model can adaptively choose when to draw on retrieved context versus its own parameters during generation.
The method shows effectiveness across six datasets when compared with various baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The frozen-backbone design could allow easy addition to already-deployed LLMs without retraining the core model.
The same gate approach might address other forms of internal inconsistency in language models beyond RAG settings.
Combining the gate with other lightweight adaptation methods could further reduce the cost of handling knowledge issues.

Load-bearing premise

A gate module trained on a tiny parameter budget can steer activations to resolve conflicts without introducing new unintended side effects on model behavior.

What would settle it

An experiment in which SHIFT fails to improve conflict resolution rates or causes measurable drops on non-conflict tasks relative to the neuron-editing baselines.

Figures

Figures reproduced from arXiv: 2606.27786 by Ge Yu, Huiyuan Xie, Maosong Sun, Pengcheng Huang, Ruochang Li, Yu Gu, Yukun Yan, Zhenghao Liu.

**Figure 2.** Figure 2: Overview of the proposed framework Shift. The workflow consists of two stages: (1) Gate-modulated Activation Steering, which enables the frozen LLM to selectively adjust its internal representations; and (2) Reinforcement-based Optimization, which guides the model toward faithful generation under knowledge conflict. To avoid full finetuning, prior studies have attempted to localize factual knowledge withi… view at source ↗

**Figure 3.** Figure 3: Performance comparison on the ConFiQA benchmark using the F1 metric. be categorized into three groups: (1) Promptingbased methods, including Vanilla-RAG (Ram et al., 2023), CtrlA (Liu et al., 2024), which performs adaptive retrieval by leveraging representationlevel control signals to guide retrieval timing. (2) Decoding-based methods, including CK-PLUG (Bi et al., 2025b), which provides plug-and-play co… view at source ↗

**Figure 5.** Figure 5: Visualization of the per-layer gate activations [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Visualization of the per-layer gate activations [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Visualization of the per-layer gate activations [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization of the per-layer gate activations [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Visualization of the per-layer gate activations [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: Training prompt used for Qwen models in Non-Thinking Mode during GRPO training in [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: Eval prompt on Qwen models in Non-Thinking Mode used for evaluation. [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

**Figure 12.** Figure 12: Training prompt used for Llama models during GRPO training in [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗

**Figure 13.** Figure 13: Eval prompt on Llama models used for evaluation. [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗

read the original abstract

Retrieval-augmented generation (RAG) enhances LLMs by incorporating external knowledge to support response generation. However, conflicts between retrieved context and parametric knowledge have emerged as a critical challenge in RAG systems. To mitigate such conflicts, numerous studies have attempted to identify and edit knowledge-related internal neurons, aiming to improve the ability of LLMs to rely on contextual evidence during generation. However, these neuron-level approaches may introduce unintended cascading effects that compromise the general capabilities of LLMs, as the modified neurons are often entangled with broader model behaviors and functionalities. In this paper, we introduce SHIFT, a novel framework that reformulates neuron-level modification as learnable gate modulation, allowing LLMs to adaptively regulate internal activations for knowledge conflict resolution. Technically, our SHIFT equips LLMs with a lightweight gate module and optimizes fewer than 0.01% trainable parameters while keeping the backbone model frozen. During generation, the gate module adjusts the model's internal representations to adaptively leverage contextual and parametric knowledge. Extensive experiments on six datasets validate the effectiveness of our SHIFT in comparison with various competing baselines. All datasets and code are available at https://github.com/OpenBMB/SHIFT.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SHIFT swaps neuron edits for a small learned gate on activations to handle RAG conflicts, but the claim of no side effects still rests on an unproven locality assumption.

read the letter

The core contribution is a lightweight gate module that modulates internal activations at inference time instead of directly editing neurons. The model stays frozen and only a tiny fraction of parameters get trained, which keeps the overhead low. Experiments run on six datasets against standard baselines, and the code is released.

This approach reframes the problem as adaptive regulation rather than permanent modification. That distinction is the actual technical step forward within the existing line of RAG conflict work.

The soft spot is the side-effect claim. The gates act on the same entangled representations that prior neuron-editing papers flagged as risky. Nothing in the description adds an explicit constraint, regularization term, or training signal that would force the gates to stay local to conflict cases. With the parameter budget capped below 0.01 percent, the module has limited capacity for highly selective routing. If the full results include clean ablations on general capabilities and no measurable drop outside the conflict tasks, that would address the concern; otherwise the advantage over direct editing remains partly assumptive.

The work sits squarely in the practical RAG reliability subfield. Readers who already track activation steering or parameter-efficient adaptation will find the method and the released code useful to test. It is not a foundational shift, but the implementation is concrete enough to merit checking.

I would send it to peer review. The idea is clear, the resource cost is low, and the experiments give a starting point for referees to evaluate whether the gate actually delivers the promised selectivity.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces SHIFT, a framework that reformulates neuron-level editing for knowledge conflict resolution in RAG as learnable gate modulation of internal activations. It equips the LLM with a lightweight gate module, optimizes fewer than 0.01% parameters while freezing the backbone, and uses the gates during generation to adaptively balance contextual and parametric knowledge. Effectiveness is validated via experiments on six datasets against competing baselines, with code and data released.

Significance. If the central claims hold, the work offers a parameter-efficient alternative to direct neuron editing that may reduce unintended side effects on general capabilities. The <0.01% trainable parameter budget and frozen backbone are notable strengths for practical deployment; reproducible code release further strengthens the contribution.

major comments (3)

[Abstract, §3] Abstract and §3 (method): The central claim that gate modulation avoids the cascading effects of prior neuron edits rests on the assumption that the learned gates will selectively affect conflict-related activations. No inductive bias, regularization term, or locality constraint is described that would prevent the gates from altering entangled non-conflict behaviors when modulating the same representations.
[§4] §4 (experiments): While six datasets are mentioned, the manuscript provides no quantitative evidence (e.g., ablation on general capability metrics such as MMLU or downstream task performance post-modulation) demonstrating that the gate module preserves capabilities better than direct editing baselines. This evidence is load-bearing for the claim of reduced side effects.
[§3.2] §3.2 (gate module): The description of the gate architecture and training objective does not specify how context-dependent routing is achieved under the <0.01% parameter budget; without this, it is unclear whether the module has sufficient capacity for the claimed adaptive regulation.

minor comments (2)

The abstract states that 'all datasets and code are available,' which is a positive; ensure the GitHub repository includes the exact training scripts, hyper-parameters, and evaluation code used for the reported results.
[§3] Notation for the gate module (e.g., how the modulation is applied to activations) should be formalized with an equation in §3 to improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3 (method): The central claim that gate modulation avoids the cascading effects of prior neuron edits rests on the assumption that the learned gates will selectively affect conflict-related activations. No inductive bias, regularization term, or locality constraint is described that would prevent the gates from altering entangled non-conflict behaviors when modulating the same representations.

Authors: We acknowledge that the manuscript does not introduce an explicit regularization term or locality constraint on the gates. The training objective optimizes the gates end-to-end on conflict-resolution examples while keeping the backbone frozen; this task-specific supervision provides an implicit bias toward selective modulation, as the gates are only updated to improve contextual reliance on the target tasks. We agree that making this reasoning more explicit would strengthen the central claim and will add a dedicated paragraph in §3 discussing the implicit inductive bias arising from the frozen backbone and conflict-focused objective. revision: partial
Referee: [§4] §4 (experiments): While six datasets are mentioned, the manuscript provides no quantitative evidence (e.g., ablation on general capability metrics such as MMLU or downstream task performance post-modulation) demonstrating that the gate module preserves capabilities better than direct editing baselines. This evidence is load-bearing for the claim of reduced side effects.

Authors: The referee correctly notes the absence of MMLU-style ablations. Because the backbone parameters remain completely frozen and only <0.01% parameters are introduced, general capabilities are preserved by construction, unlike neuron-editing methods that permanently alter entangled neurons. Nevertheless, to provide the requested quantitative support we will add post-modulation evaluations on MMLU and a held-out downstream task in the revised §4, comparing SHIFT against the direct-editing baselines. revision: yes
Referee: [§3.2] §3.2 (gate module): The description of the gate architecture and training objective does not specify how context-dependent routing is achieved under the <0.01% parameter budget; without this, it is unclear whether the module has sufficient capacity for the claimed adaptive regulation.

Authors: We apologize for the insufficient detail in §3.2. The gate module is a lightweight, shared low-rank adapter per layer that receives both the current activation and a compact context embedding; the modulation is computed via a small MLP whose parameters are tied across layers, keeping the total trainable count under 0.01%. Context dependence arises because the context embedding is recomputed from the retrieved passage at each generation step. We will expand §3.2 with the precise architecture diagram, parameter breakdown, and training objective to clarify capacity and routing. revision: yes

Circularity Check

0 steps flagged

No circularity: SHIFT introduces independent gate module without reduction to inputs

full rationale

The paper's core contribution is the introduction of a lightweight gate module (optimizing <0.01% parameters on a frozen backbone) that reformulates prior neuron-editing ideas as adaptive modulation. No equations, fitted parameters, or self-citations appear in the abstract or description that would make any claimed prediction or result equivalent to its inputs by construction. The approach is presented as an external technical addition addressing cascading effects of neuron edits, with no self-definitional loops, no renaming of known results, and no load-bearing self-citation chains. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

Ledger extracted directly from the abstract's description of the method and its contrast with neuron editing.

free parameters (1)

gate module parameters = <0.01% of total parameters
The trainable parameters of the lightweight gate module, stated to be fewer than 0.01% of the backbone.

axioms (1)

domain assumption Gate modulation of internal activations can resolve knowledge conflicts without the cascading effects of neuron-level edits.
Invoked when the paper reframes neuron modification as gate modulation to preserve general capabilities.

invented entities (1)

lightweight gate module no independent evidence
purpose: To adjust internal representations during generation for adaptive use of contextual versus parametric knowledge.
New component introduced as the core of the SHIFT framework.

pith-pipeline@v0.9.1-grok · 5761 in / 1346 out tokens · 56422 ms · 2026-06-29T04:35:38.719368+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

169 extracted references · 18 canonical work pages

[1]

Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining , pages=

A survey on rag meeting llms: Towards retrieval-augmented large language models , author=. Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining , pages=
[2]

Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) , pages=

How much knowledge can you pack into the parameters of a language model? , author=. Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) , pages=

2020
[3]

Advances in neural information processing systems , volume=

Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in neural information processing systems , volume=
[4]

International Conference on Learning Representations , volume=

Redeep: Detecting hallucination in retrieval-augmented generation via mechanistic interpretability , author=. International Conference on Learning Representations , volume=
[5]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

Replug: Retrieval-augmented black-box language models , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2024
[6]

Transactions of the Association for Computational Linguistics , volume=

In-context retrieval-augmented language models , author=. Transactions of the Association for Computational Linguistics , volume=. 2023 , publisher=

2023
[7]

International conference on machine learning , pages=

Improving language models by retrieving from trillions of tokens , author=. International conference on machine learning , pages=. 2022 , organization=

2022
[8]

2020 , eprint=

REALM: Retrieval-Augmented Language Model Pre-Training , author=. 2020 , eprint=

2020
[9]

2026 , eprint=

Beyond Precision: Training-Inference Mismatch is an Optimization Problem and Simple LR Scheduling Fixes It , author=. 2026 , eprint=

2026
[10]

Findings of the Association for Computational Linguistics: ACL 2024 , pages=

Cutting off the head ends the conflict: A mechanism for interpreting and mitigating knowledge conflicts in language models , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=

2024
[11]

2024 , eprint=

Studying Large Language Model Behaviors Under Context-Memory Conflicts With Real Documents , author=. 2024 , eprint=

2024
[12]

2024 , eprint=

Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models , author=. 2024 , eprint=

2024
[13]

2023 , eprint=

The Internal State of an LLM Knows When It's Lying , author=. 2023 , eprint=

2023
[14]

arXiv preprint arXiv:2310.06824 , year=

The geometry of truth: Emergent linear structure in large language model representations of true/false datasets , author=. arXiv preprint arXiv:2310.06824 , year=

Pith/arXiv arXiv
[15]

2025 , eprint=

Eliciting Latent Predictions from Transformers with the Tuned Lens , author=. 2025 , eprint=

2025
[16]

Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL) , pages=

Future lens: Anticipating subsequent tokens from a single hidden state , author=. Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL) , pages=
[17]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Truthfulrag: Resolving factual-level conflicts in retrieval-augmented generation with knowledge graphs , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[18]

the moon is made of marshmallows

Faitheval: Can your language model stay faithful to context, even if" the moon is made of marshmallows" , author=. International Conference on Learning Representations , volume=
[19]

International Conference on Learning Representations , volume=

Adaptive chameleon or stubborn sloth: Revealing the behavior of large language models in knowledge conflicts , author=. International Conference on Learning Representations , volume=
[20]

Findings of the Association for Computational Linguistics: EMNLP 2021 , pages=

Retrieval augmentation reduces hallucination in conversation , author=. Findings of the Association for Computational Linguistics: EMNLP 2021 , pages=

2021
[21]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

Rich knowledge sources bring complex knowledge conflicts: Recalibrating models to reflect conflicting evidence , author=. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

2022
[22]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Understanding parametric and contextual knowledge reconciliation within large language models , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
[23]

2025 , eprint=

The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation , author=. 2025 , eprint=

2025
[24]

The Twelfth International Conference on Learning Representations , year=

Self-rag: Learning to retrieve, generate, and critique through self-reflection , author=. The Twelfth International Conference on Learning Representations , year=
[25]

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

The power of noise: Redefining retrieval for rag systems , author=. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
[26]

Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov

Kwiatkowski, Tom and Palomaki, Jennimaria and Redfield, Olivia and Collins, Michael and Parikh, Ankur and Alberti, Chris and Epstein, Danielle and Polosukhin, Illia and Devlin, Jacob and Lee, Kenton and Toutanova, Kristina and Jones, Llion and Kelcey, Matthew and Chang, Ming-Wei and Dai, Andrew M. and Uszkoreit, Jakob and Le, Quoc and Petrov, Slav. Natura...

work page doi:10.1162/tacl_a_00276 2019
[27]

T rivia QA : A large scale distantly supervised challenge dataset for reading comprehension

Joshi, Mandar and Choi, Eunsol and Weld, Daniel and Zettlemoyer, Luke. T rivia QA : A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1147

work page doi:10.18653/v1/p17-1147 2017
[28]

Cohen, Ruslan Salakhut- dinov, and Christopher D

Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William and Salakhutdinov, Ruslan and Manning, Christopher D. H otpot QA : A Dataset for Diverse, Explainable Multi-hop Question Answering. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1259

work page doi:10.18653/v1/d18-1259 2018
[29]

arXiv preprint arXiv:2505.09388 , year=

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

Pith/arXiv arXiv
[30]

Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=
[31]

2024 , eprint=

CtrlA: Adaptive Retrieval-Augmented Generation via Inherent Control , author=. 2024 , eprint=

2024
[32]

2021 , eprint=

Measuring Massive Multitask Language Understanding , author=. 2021 , eprint=

2021
[33]

2024 , eprint=

Context-DPO: Aligning Language Models for Context-Faithfulness , author=. 2024 , eprint=

2024
[34]

2022 , eprint=

Finetuned Language Models Are Zero-Shot Learners , author=. 2022 , eprint=

2022
[35]

2023 , eprint=

In-Context Retrieval-Augmented Language Models , author=. 2023 , eprint=

2023
[36]

2026 , eprint=

Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement , author=. 2026 , eprint=

2026
[37]

Context: Fine-Grained Control of Knowledge Reliance in Language Models , author=

Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models , author=. 2025 , eprint=

2025
[38]

2025 , eprint=

AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge , author=. 2025 , eprint=

2025
[39]

2025 , eprint=

FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented Generation , author=. 2025 , eprint=

2025
[40]

2023 , eprint=

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection , author=. 2023 , eprint=

2023
[41]

2025 , eprint=

Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models , author=. 2025 , eprint=

2025
[42]

Proceedings of the 2016 conference on empirical methods in natural language processing , pages=

Squad: 100,000+ questions for machine comprehension of text , author=. Proceedings of the 2016 conference on empirical methods in natural language processing , pages=

2016
[43]

N ews QA : A Machine Comprehension Dataset

Trischler, Adam and Wang, Tong and Yuan, Xingdi and Harris, Justin and Sordoni, Alessandro and Bachman, Philip and Suleman, Kaheer. N ews QA : A Machine Comprehension Dataset. Proceedings of the 2nd Workshop on Representation Learning for NLP. 2017. doi:10.18653/v1/W17-2623

work page doi:10.18653/v1/w17-2623 2017
[44]

arXiv preprint arXiv:1704.05179 , year=

Searchqa: A new q&a dataset augmented with context from a search engine , author=. arXiv preprint arXiv:1704.05179 , year=

Pith/arXiv arXiv
[45]

2022 , eprint=

Entity-Based Knowledge Conflicts in Question Answering , author=. 2022 , eprint=

2022
[46]

MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension

Fisch, Adam and Talmor, Alon and Jia, Robin and Seo, Minjoon and Choi, Eunsol and Chen, Danqi. MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension. Proceedings of the 2nd Workshop on Machine Reading for Question Answering. 2019. doi:10.18653/v1/D19-5801

work page doi:10.18653/v1/d19-5801 2019
[47]

arXiv preprint arXiv:2404.16032 , year=

Studying large language model behaviors under context-memory conflicts with real documents , author=. arXiv preprint arXiv:2404.16032 , year=

arXiv
[48]

2020 , eprint=

How Much Knowledge Can You Pack Into the Parameters of a Language Model? , author=. 2020 , eprint=

2020
[49]

Advances in neural information processing systems , volume=

Realtime qa: What's the answer right now? , author=. Advances in neural information processing systems , volume=
[50]

Advances in Neural Information Processing Systems , volume=

Toolqa: A dataset for llm question answering with external tools , author=. Advances in Neural Information Processing Systems , volume=
[51]

Advances in neural information processing systems , volume=

Toolformer: Language models can teach themselves to use tools , author=. Advances in neural information processing systems , volume=
[52]

arXiv preprint arXiv:2109.01652 , year=

Finetuned language models are zero-shot learners , author=. arXiv preprint arXiv:2109.01652 , year=

Pith/arXiv arXiv
[53]

ACM computing surveys , volume=

Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing , author=. ACM computing surveys , volume=. 2023 , publisher=

2023
[54]

2025 , eprint=

ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence , author=. 2025 , eprint=

2025
[55]

Knowledge Neurons in Pretrained Transformers

Dai, Damai and Dong, Li and Hao, Yaru and Sui, Zhifang and Chang, Baobao and Wei, Furu. Knowledge Neurons in Pretrained Transformers. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.581

work page doi:10.18653/v1/2022.acl-long.581 2022
[56]

2025 , eprint=

ParamMute: Suppressing Knowledge-Critical FFNs for Faithful Retrieval-Augmented Generation , author=. 2025 , eprint=

2025
[57]

Context: Fine-Grained Control of Knowledge Reliance in Language Models , author=

Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models , author=. arXiv preprint arXiv:2503.15888 , year=

arXiv
[58]

The Fourteenth International Conference on Learning Representations , year=

LUMINA: Detecting Hallucinations in RAG System with Context–Knowledge Signals , author=. The Fourteenth International Conference on Learning Representations , year=
[59]

2024 , eprint=

Backward Lens: Projecting Language Model Gradients into the Vocabulary Space , author=. 2024 , eprint=

2024
[60]

arXiv preprint arXiv:2310.14053 , year=

Beyond accuracy: Evaluating self-consistency of code large language models with identitychain , author=. arXiv preprint arXiv:2310.14053 , year=

arXiv
[61]

arXiv preprint arXiv:2501.12948 , year=

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=

Pith/arXiv arXiv
[62]

2025 , eprint=

ToolRL: Reward is All Tool Learning Needs , author=. 2025 , eprint=

2025
[63]

2024 , eprint=

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , author=. 2024 , eprint=

2024
[64]

arXiv preprint arXiv:2203.11171 , year=

Self-consistency improves chain of thought reasoning in language models , author=. arXiv preprint arXiv:2203.11171 , year=

Pith/arXiv arXiv
[65]

2024 , eprint=

Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and Mitigating Knowledge Conflicts in Language Models , author=. 2024 , eprint=

2024
[66]

2023 , eprint=

Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism , author=. 2023 , eprint=

2023
[67]

2024 , eprint=

Retrieval Head Mechanistically Explains Long-Context Factuality , author=. 2024 , eprint=

2024
[68]

arXiv preprint arXiv:2309.15098 , year=

Attention satisfies: A constraint-satisfaction lens on factual errors of language models , author=. arXiv preprint arXiv:2309.15098 , year=

arXiv
[69]

Self-dc: When to reason and when to act? self divide-and-conquer for compositional unknown questions , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2025
[70]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

The earth is flat because...: Investigating llms’ belief towards misinformation via persuasive conversation , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[71]

Proceedings of the 2024 joint international conference on computational linguistics, language resources and evaluation (LREC-COLING 2024) , pages=

Tug-of-war between knowledge: Exploring and resolving knowledge conflicts in retrieval-augmented language models , author=. Proceedings of the 2024 joint international conference on computational linguistics, language resources and evaluation (LREC-COLING 2024) , pages=

2024
[72]

arXiv preprint arXiv:2405.11613 , year=

Decoding by Contrasting Knowledge: Enhancing LLMs' Confidence on Edited Facts , author=. arXiv preprint arXiv:2405.11613 , year=

arXiv
[73]

arXiv preprint arXiv:2310.00935 , year=

Resolving knowledge conflicts in large language models , author=. arXiv preprint arXiv:2310.00935 , year=

arXiv
[74]

arXiv e-prints , pages=

Retrieve only when it needs: Adaptive retrieval augmentation for hallucination mitigation in large language models , author=. arXiv e-prints , pages=
[75]

Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

Context-faithful prompting for large language models , author=. Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

2023
[76]

arXiv preprint arXiv:2601.06842 , year=

Seeing through the Conflict: Transparent Knowledge Conflict Handling in Retrieval-Augmented Generation , author=. arXiv preprint arXiv:2601.06842 , year=

arXiv
[77]

arXiv preprint arXiv:2403.08319 , year=

Knowledge Conflicts for LLMs: A Survey , author=. arXiv preprint arXiv:2403.08319 , year=

arXiv
[78]

arXiv preprint arXiv:2510.12460 , year=

Probing Latent Knowledge Conflict for Faithful Retrieval-Augmented Generation , author=. arXiv preprint arXiv:2510.12460 , year=

Pith/arXiv arXiv
[79]

Adacad: Adaptively decoding to balance conflicts between contextual and parametric knowledge , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2025
[80]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

Showing first 80 references.

[1] [1]

Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining , pages=

A survey on rag meeting llms: Towards retrieval-augmented large language models , author=. Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining , pages=

[2] [2]

Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) , pages=

How much knowledge can you pack into the parameters of a language model? , author=. Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) , pages=

2020

[3] [3]

Advances in neural information processing systems , volume=

Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in neural information processing systems , volume=

[4] [4]

International Conference on Learning Representations , volume=

Redeep: Detecting hallucination in retrieval-augmented generation via mechanistic interpretability , author=. International Conference on Learning Representations , volume=

[5] [5]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

Replug: Retrieval-augmented black-box language models , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2024

[6] [6]

Transactions of the Association for Computational Linguistics , volume=

In-context retrieval-augmented language models , author=. Transactions of the Association for Computational Linguistics , volume=. 2023 , publisher=

2023

[7] [7]

International conference on machine learning , pages=

Improving language models by retrieving from trillions of tokens , author=. International conference on machine learning , pages=. 2022 , organization=

2022

[8] [8]

2020 , eprint=

REALM: Retrieval-Augmented Language Model Pre-Training , author=. 2020 , eprint=

2020

[9] [9]

2026 , eprint=

Beyond Precision: Training-Inference Mismatch is an Optimization Problem and Simple LR Scheduling Fixes It , author=. 2026 , eprint=

2026

[10] [10]

Findings of the Association for Computational Linguistics: ACL 2024 , pages=

Cutting off the head ends the conflict: A mechanism for interpreting and mitigating knowledge conflicts in language models , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=

2024

[11] [11]

2024 , eprint=

Studying Large Language Model Behaviors Under Context-Memory Conflicts With Real Documents , author=. 2024 , eprint=

2024

[12] [12]

2024 , eprint=

Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models , author=. 2024 , eprint=

2024

[13] [13]

2023 , eprint=

The Internal State of an LLM Knows When It's Lying , author=. 2023 , eprint=

2023

[14] [14]

arXiv preprint arXiv:2310.06824 , year=

The geometry of truth: Emergent linear structure in large language model representations of true/false datasets , author=. arXiv preprint arXiv:2310.06824 , year=

Pith/arXiv arXiv

[15] [15]

2025 , eprint=

Eliciting Latent Predictions from Transformers with the Tuned Lens , author=. 2025 , eprint=

2025

[16] [16]

Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL) , pages=

Future lens: Anticipating subsequent tokens from a single hidden state , author=. Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL) , pages=

[17] [17]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Truthfulrag: Resolving factual-level conflicts in retrieval-augmented generation with knowledge graphs , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[18] [18]

the moon is made of marshmallows

Faitheval: Can your language model stay faithful to context, even if" the moon is made of marshmallows" , author=. International Conference on Learning Representations , volume=

[19] [19]

International Conference on Learning Representations , volume=

Adaptive chameleon or stubborn sloth: Revealing the behavior of large language models in knowledge conflicts , author=. International Conference on Learning Representations , volume=

[20] [20]

Findings of the Association for Computational Linguistics: EMNLP 2021 , pages=

Retrieval augmentation reduces hallucination in conversation , author=. Findings of the Association for Computational Linguistics: EMNLP 2021 , pages=

2021

[21] [21]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

Rich knowledge sources bring complex knowledge conflicts: Recalibrating models to reflect conflicting evidence , author=. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

2022

[22] [22]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Understanding parametric and contextual knowledge reconciliation within large language models , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

[23] [23]

2025 , eprint=

The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation , author=. 2025 , eprint=

2025

[24] [24]

The Twelfth International Conference on Learning Representations , year=

Self-rag: Learning to retrieve, generate, and critique through self-reflection , author=. The Twelfth International Conference on Learning Representations , year=

[25] [25]

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

The power of noise: Redefining retrieval for rag systems , author=. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

[26] [26]

Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov

Kwiatkowski, Tom and Palomaki, Jennimaria and Redfield, Olivia and Collins, Michael and Parikh, Ankur and Alberti, Chris and Epstein, Danielle and Polosukhin, Illia and Devlin, Jacob and Lee, Kenton and Toutanova, Kristina and Jones, Llion and Kelcey, Matthew and Chang, Ming-Wei and Dai, Andrew M. and Uszkoreit, Jakob and Le, Quoc and Petrov, Slav. Natura...

work page doi:10.1162/tacl_a_00276 2019

[27] [27]

T rivia QA : A large scale distantly supervised challenge dataset for reading comprehension

Joshi, Mandar and Choi, Eunsol and Weld, Daniel and Zettlemoyer, Luke. T rivia QA : A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1147

work page doi:10.18653/v1/p17-1147 2017

[28] [28]

Cohen, Ruslan Salakhut- dinov, and Christopher D

Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William and Salakhutdinov, Ruslan and Manning, Christopher D. H otpot QA : A Dataset for Diverse, Explainable Multi-hop Question Answering. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1259

work page doi:10.18653/v1/d18-1259 2018

[29] [29]

arXiv preprint arXiv:2505.09388 , year=

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

Pith/arXiv arXiv

[30] [30]

Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

[31] [31]

2024 , eprint=

CtrlA: Adaptive Retrieval-Augmented Generation via Inherent Control , author=. 2024 , eprint=

2024

[32] [32]

2021 , eprint=

Measuring Massive Multitask Language Understanding , author=. 2021 , eprint=

2021

[33] [33]

2024 , eprint=

Context-DPO: Aligning Language Models for Context-Faithfulness , author=. 2024 , eprint=

2024

[34] [34]

2022 , eprint=

Finetuned Language Models Are Zero-Shot Learners , author=. 2022 , eprint=

2022

[35] [35]

2023 , eprint=

In-Context Retrieval-Augmented Language Models , author=. 2023 , eprint=

2023

[36] [36]

2026 , eprint=

Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement , author=. 2026 , eprint=

2026

[37] [37]

Context: Fine-Grained Control of Knowledge Reliance in Language Models , author=

Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models , author=. 2025 , eprint=

2025

[38] [38]

2025 , eprint=

AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge , author=. 2025 , eprint=

2025

[39] [39]

2025 , eprint=

FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented Generation , author=. 2025 , eprint=

2025

[40] [40]

2023 , eprint=

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection , author=. 2023 , eprint=

2023

[41] [41]

2025 , eprint=

Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models , author=. 2025 , eprint=

2025

[42] [42]

Proceedings of the 2016 conference on empirical methods in natural language processing , pages=

Squad: 100,000+ questions for machine comprehension of text , author=. Proceedings of the 2016 conference on empirical methods in natural language processing , pages=

2016

[43] [43]

N ews QA : A Machine Comprehension Dataset

Trischler, Adam and Wang, Tong and Yuan, Xingdi and Harris, Justin and Sordoni, Alessandro and Bachman, Philip and Suleman, Kaheer. N ews QA : A Machine Comprehension Dataset. Proceedings of the 2nd Workshop on Representation Learning for NLP. 2017. doi:10.18653/v1/W17-2623

work page doi:10.18653/v1/w17-2623 2017

[44] [44]

arXiv preprint arXiv:1704.05179 , year=

Searchqa: A new q&a dataset augmented with context from a search engine , author=. arXiv preprint arXiv:1704.05179 , year=

Pith/arXiv arXiv

[45] [45]

2022 , eprint=

Entity-Based Knowledge Conflicts in Question Answering , author=. 2022 , eprint=

2022

[46] [46]

MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension

Fisch, Adam and Talmor, Alon and Jia, Robin and Seo, Minjoon and Choi, Eunsol and Chen, Danqi. MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension. Proceedings of the 2nd Workshop on Machine Reading for Question Answering. 2019. doi:10.18653/v1/D19-5801

work page doi:10.18653/v1/d19-5801 2019

[47] [47]

arXiv preprint arXiv:2404.16032 , year=

Studying large language model behaviors under context-memory conflicts with real documents , author=. arXiv preprint arXiv:2404.16032 , year=

arXiv

[48] [48]

2020 , eprint=

How Much Knowledge Can You Pack Into the Parameters of a Language Model? , author=. 2020 , eprint=

2020

[49] [49]

Advances in neural information processing systems , volume=

Realtime qa: What's the answer right now? , author=. Advances in neural information processing systems , volume=

[50] [50]

Advances in Neural Information Processing Systems , volume=

Toolqa: A dataset for llm question answering with external tools , author=. Advances in Neural Information Processing Systems , volume=

[51] [51]

Advances in neural information processing systems , volume=

Toolformer: Language models can teach themselves to use tools , author=. Advances in neural information processing systems , volume=

[52] [52]

arXiv preprint arXiv:2109.01652 , year=

Finetuned language models are zero-shot learners , author=. arXiv preprint arXiv:2109.01652 , year=

Pith/arXiv arXiv

[53] [53]

ACM computing surveys , volume=

Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing , author=. ACM computing surveys , volume=. 2023 , publisher=

2023

[54] [54]

2025 , eprint=

ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence , author=. 2025 , eprint=

2025

[55] [55]

Knowledge Neurons in Pretrained Transformers

Dai, Damai and Dong, Li and Hao, Yaru and Sui, Zhifang and Chang, Baobao and Wei, Furu. Knowledge Neurons in Pretrained Transformers. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.581

work page doi:10.18653/v1/2022.acl-long.581 2022

[56] [56]

2025 , eprint=

ParamMute: Suppressing Knowledge-Critical FFNs for Faithful Retrieval-Augmented Generation , author=. 2025 , eprint=

2025

[57] [57]

Context: Fine-Grained Control of Knowledge Reliance in Language Models , author=

Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models , author=. arXiv preprint arXiv:2503.15888 , year=

arXiv

[58] [58]

The Fourteenth International Conference on Learning Representations , year=

LUMINA: Detecting Hallucinations in RAG System with Context–Knowledge Signals , author=. The Fourteenth International Conference on Learning Representations , year=

[59] [59]

2024 , eprint=

Backward Lens: Projecting Language Model Gradients into the Vocabulary Space , author=. 2024 , eprint=

2024

[60] [60]

arXiv preprint arXiv:2310.14053 , year=

Beyond accuracy: Evaluating self-consistency of code large language models with identitychain , author=. arXiv preprint arXiv:2310.14053 , year=

arXiv

[61] [61]

arXiv preprint arXiv:2501.12948 , year=

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=

Pith/arXiv arXiv

[62] [62]

2025 , eprint=

ToolRL: Reward is All Tool Learning Needs , author=. 2025 , eprint=

2025

[63] [63]

2024 , eprint=

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , author=. 2024 , eprint=

2024

[64] [64]

arXiv preprint arXiv:2203.11171 , year=

Self-consistency improves chain of thought reasoning in language models , author=. arXiv preprint arXiv:2203.11171 , year=

Pith/arXiv arXiv

[65] [65]

2024 , eprint=

Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and Mitigating Knowledge Conflicts in Language Models , author=. 2024 , eprint=

2024

[66] [66]

2023 , eprint=

Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism , author=. 2023 , eprint=

2023

[67] [67]

2024 , eprint=

Retrieval Head Mechanistically Explains Long-Context Factuality , author=. 2024 , eprint=

2024

[68] [68]

arXiv preprint arXiv:2309.15098 , year=

Attention satisfies: A constraint-satisfaction lens on factual errors of language models , author=. arXiv preprint arXiv:2309.15098 , year=

arXiv

[69] [69]

Self-dc: When to reason and when to act? self divide-and-conquer for compositional unknown questions , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2025

[70] [70]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

The earth is flat because...: Investigating llms’ belief towards misinformation via persuasive conversation , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[71] [71]

Proceedings of the 2024 joint international conference on computational linguistics, language resources and evaluation (LREC-COLING 2024) , pages=

Tug-of-war between knowledge: Exploring and resolving knowledge conflicts in retrieval-augmented language models , author=. Proceedings of the 2024 joint international conference on computational linguistics, language resources and evaluation (LREC-COLING 2024) , pages=

2024

[72] [72]

arXiv preprint arXiv:2405.11613 , year=

Decoding by Contrasting Knowledge: Enhancing LLMs' Confidence on Edited Facts , author=. arXiv preprint arXiv:2405.11613 , year=

arXiv

[73] [73]

arXiv preprint arXiv:2310.00935 , year=

Resolving knowledge conflicts in large language models , author=. arXiv preprint arXiv:2310.00935 , year=

arXiv

[74] [74]

arXiv e-prints , pages=

Retrieve only when it needs: Adaptive retrieval augmentation for hallucination mitigation in large language models , author=. arXiv e-prints , pages=

[75] [75]

Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

Context-faithful prompting for large language models , author=. Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

2023

[76] [76]

arXiv preprint arXiv:2601.06842 , year=

Seeing through the Conflict: Transparent Knowledge Conflict Handling in Retrieval-Augmented Generation , author=. arXiv preprint arXiv:2601.06842 , year=

arXiv

[77] [77]

arXiv preprint arXiv:2403.08319 , year=

Knowledge Conflicts for LLMs: A Survey , author=. arXiv preprint arXiv:2403.08319 , year=

arXiv

[78] [78]

arXiv preprint arXiv:2510.12460 , year=

Probing Latent Knowledge Conflict for Faithful Retrieval-Augmented Generation , author=. arXiv preprint arXiv:2510.12460 , year=

Pith/arXiv arXiv

[79] [79]

Adacad: Adaptively decoding to balance conflicts between contextual and parametric knowledge , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2025

[80] [80]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=