pith. sign in

arxiv: 2510.17210 · v3 · submitted 2025-10-20 · 💻 cs.CL

Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting

Pith reviewed 2026-05-18 06:45 UTC · model grok-4.3

classification 💻 cs.CL
keywords machine unlearninglarge language modelsattention shiftinghallucination preventionselective forgettingknowledge localizationdual-loss optimization
0
0 comments X p. Extension

The pith

Attention-shifting unlearning lets LLMs forget facts without hallucinations or accuracy loss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an Attention-Shifting framework that removes targeted knowledge from large language models while keeping general performance intact and preventing fabricated answers. It works by reducing attention to fact-bearing tokens in the data to be forgotten and reinforcing attention to essential tokens in the data to be retained. These two interventions are trained together through a dual-loss objective that creates a soft boundary around the unwanted knowledge. Experiments on standard benchmarks show the approach maintains higher accuracy than prior unlearning techniques while still blocking hallucinations on forgotten topics.

Core claim

Attention-Shifting achieves selective unlearning by applying importance-aware suppression to attenuate attention on memorized facts in the unlearning set and attention-guided retention enhancement to reinforce semantically important tokens in the retained set. The two components are jointly optimized via a dual-loss objective that localizes changes under representation superposition, preserving linguistic structure and unrelated knowledge.

What carries the argument

Attention-Shifting (AS) framework that performs context-preserving suppression on the unlearning set combined with hallucination-resistant retention enhancement on the retained set through a dual-loss objective.

Load-bearing premise

That jointly optimizing importance-aware suppression on unlearning data and attention-guided enhancement on retained data can localize forgetting without disrupting overall language capabilities or unrelated knowledge.

What would settle it

Running the Attention-Shifting method on the ToFU or TDEC benchmarks and finding accuracy equal to or lower than existing unlearning baselines, or observing persistent hallucinated responses on queries about the unlearned content.

Figures

Figures reproduced from arXiv: 2510.17210 by Chenchen Tan, Cunjian Chen, Hui Zhang, Longxiang Gao, Shujie Cui, Xinghao Li, Youyang Qu.

Figure 1
Figure 1. Figure 1: LLMs tend to assign high attention to semanti [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of the proposed Attention-Shifting based unlearning in LLMs. An adapter [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The performance of unlearning methods with [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Outputs hallucination and reproduc￾tion rates across different unlearning methods. how often the model regenerates the unlearned content or its paraphrased variants. Both metrics are assessed by GPT-4 [38], comparing the model outputs with the original content4 . As shown in [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Model utility across unlearning levels under different model performance re￾taining strategies. Retaining Method Usage Ablation. To analyse the contribution of each component in our AS frame￾work, we conduct an ablation study by separating ASP and AKL. As shown in [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Evaluation of model utility accuracy degradation under a fixed unlearning threshold across [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Model utility degradation across multiple continue-unlearning requests (4 samples) for [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The left heatmap shows the original model’s attention, where fact-bearing tokens such [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Gradient cosine similarity between the unlearn and retain losses with respect to the adapter [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Evaluation of model utility degradation under a fixed unlearning threshold across varying [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
read the original abstract

The increase in computing power and the necessity of AI-assisted decision-making boost the growing application of large language models (LLMs). Along with this, the potential retention of sensitive data of LLMs has spurred increasing research into machine unlearning. However, existing unlearning approaches face a critical dilemma: Aggressive unlearning compromises model utility, while conservative strategies preserve utility but risk hallucinated responses. This significantly limits LLMs' reliability in knowledge-intensive applications. To address this, we introduce a novel Attention-Shifting (AS) framework for selective unlearning. AS is driven by two design objectives: (1) context-preserving suppression that attenuates attention to fact-bearing tokens without disrupting LLMs' linguistic structure; and (2) hallucination-resistant response shaping that discourages fabricated completions when queried about unlearning content. AS realizes these objectives through two attention-level interventions, which are importance-aware suppression applied to the unlearning set to reduce reliance on memorized knowledge and attention-guided retention enhancement that reinforces attention toward semantically essential tokens in the retained dataset to mitigate unintended degradation. These two components are jointly optimized via a dual-loss objective, which forms a soft boundary that localizes unlearning while preserving unrelated knowledge under representation superposition. Experimental results show that AS improves performance preservation over the state-of-the-art unlearning methods, achieving up to 15% higher accuracy on the ToFU benchmark and 10% on the TDEC benchmark, while maintaining competitive hallucination-free unlearning effectiveness. Compared to existing methods, AS demonstrates a superior balance between unlearning effectiveness, generalization, and response reliability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce an Attention-Shifting (AS) framework for selective, hallucination-free unlearning in LLMs. It combines importance-aware suppression on fact-bearing tokens from the unlearning set with attention-guided retention enhancement on semantically essential tokens from the retained set; these interventions are jointly optimized through a dual-loss objective that purportedly forms a soft boundary localizing unlearning effects while preserving linguistic structure and unrelated knowledge under representation superposition. Experiments report up to 15% higher accuracy on the ToFU benchmark and 10% on the TDEC benchmark relative to prior unlearning methods, while maintaining competitive unlearning effectiveness.

Significance. If the empirical claims are robust, the work would meaningfully advance LLM unlearning by offering a practical mechanism that better balances forgetting of sensitive data against preservation of model utility and response reliability, addressing a central practical limitation in knowledge-intensive deployments.

major comments (2)
  1. [Abstract / Experimental Results] Abstract and Experimental Results: the reported accuracy gains (15% on ToFU, 10% on TDEC) are presented without error bars, statistical tests, or ablations on the dual-loss weighting parameter; this is load-bearing for the central claim of superior performance preservation over SOTA methods.
  2. [Method (dual-loss objective)] Method description of the dual-loss objective: the claim that importance-aware suppression and attention-guided retention can be jointly optimized to localize unlearning without eroding retained knowledge rests on the untested assumption that attention interventions remain separable despite representation superposition; no analysis or experiment addresses whether the same heads or token representations participate in both memorized facts and general linguistic structure.
minor comments (2)
  1. [Abstract] The abstract could more explicitly define the hallucination-free metric used to claim competitive effectiveness.
  2. [Method] Notation and implementation details for the attention interventions would benefit from a concise pseudocode or equation block to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to strengthen the empirical reporting and methodological analysis.

read point-by-point responses
  1. Referee: [Abstract / Experimental Results] Abstract and Experimental Results: the reported accuracy gains (15% on ToFU, 10% on TDEC) are presented without error bars, statistical tests, or ablations on the dual-loss weighting parameter; this is load-bearing for the central claim of superior performance preservation over SOTA methods.

    Authors: We agree that error bars, statistical tests, and ablations on the dual-loss weighting parameter are necessary to robustly support the performance claims. In the revised manuscript, we will include results averaged over multiple random seeds with standard error bars, apply statistical significance tests (such as paired t-tests) against baseline methods, and add an ablation study varying the dual-loss weighting parameter to demonstrate its effect on the unlearning-utility trade-off. revision: yes

  2. Referee: [Method (dual-loss objective)] Method description of the dual-loss objective: the claim that importance-aware suppression and attention-guided retention can be jointly optimized to localize unlearning without eroding retained knowledge rests on the untested assumption that attention interventions remain separable despite representation superposition; no analysis or experiment addresses whether the same heads or token representations participate in both memorized facts and general linguistic structure.

    Authors: The dual-loss objective is intended to create a soft boundary that localizes unlearning effects while preserving unrelated knowledge, as supported by the empirical utility preservation results. We acknowledge that a direct examination of whether the same attention heads or token representations are involved in both unlearning targets and retained linguistic structure would provide additional insight into separability under superposition. We will incorporate such an analysis (e.g., via attention map comparisons or representation probing) in the revised version. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces an empirical Attention-Shifting framework consisting of importance-aware suppression and attention-guided retention enhancement, jointly optimized by a dual-loss objective and evaluated on external benchmarks (ToFU, TDEC). No equations, fitted parameters, or self-citations are presented in the provided text that reduce the reported accuracy gains or unlearning effectiveness to quantities defined by construction from the method's own inputs. The central claims rest on experimental outcomes against independent test sets rather than any self-referential renaming, ansatz smuggling, or load-bearing self-citation chain, rendering the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard transformer attention mechanics and the assumption that attention weights can be selectively modulated without breaking next-token prediction coherence.

axioms (1)
  • domain assumption Transformer attention layers can be intervened upon at inference or fine-tuning time to attenuate fact-bearing tokens while preserving overall linguistic structure.
    Invoked in the description of context-preserving suppression.

pith-pipeline@v0.9.0 · 5833 in / 1179 out tokens · 23118 ms · 2026-05-18T06:45:55.108472+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LLM Ghostbusters: Surgical Hallucination Suppression via Adaptive Unlearning

    cs.CR 2026-05 unverdicted novelty 6.0

    Adaptive Unlearning suppresses package hallucinations in code-generating LLMs by 81% while preserving benchmark performance, using model-generated data and no human labels.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · cited by 1 Pith paper · 6 internal anchors

  1. [1]

    Snapkv: Llm knows what you are looking for before generation

    Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, and Deming Chen. Snapkv: Llm knows what you are looking for before generation. Advances in Neural Information Processing Systems, 37:22947–22970, 2024

  2. [2]

    Extracting training data from large language models

    Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Kather- ine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pages 2633–2650, 2021

  3. [3]

    Jailbroken: How does llm safety training fail? Advances in Neural Information Processing Systems, 36, 2024

    Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. Jailbroken: How does llm safety training fail? Advances in Neural Information Processing Systems, 36, 2024

  4. [4]

    The eu general data protection regulation (gdpr).A practical guide, 1st ed., Cham: Springer International Publishing, 10(3152676):10–5555, 2017

    Paul V oigt and Axel V on dem Bussche. The eu general data protection regulation (gdpr).A practical guide, 1st ed., Cham: Springer International Publishing, 10(3152676):10–5555, 2017

  5. [5]

    Knowledge unlearning for mitigating privacy risks in language models

    Joel Jang, Dongkeun Yoon, Sohee Yang, Sungmin Cha, Moontae Lee, Lajanugen Logeswaran, and Minjoon Seo. Knowledge unlearning for mitigating privacy risks in language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (V olume1: Long Papers), pages 14389–14408, Toronto, Canada, July 2023. Association for Compu...

  6. [6]

    Unlearn what you want to forget: Efficient unlearning for LLMs

    Jiaao Chen and Diyi Yang. Unlearn what you want to forget: Efficient unlearning for LLMs. In The 2023 Conference on Empirical Methods in Natural Language Processing, 2023

  7. [7]

    Rethinking machine unlearning for large language models

    Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Yuguang Yao, Chris Yuhao Liu, Xiaojun Xu, Hang Li, et al. Rethinking machine unlearning for large language models. Nature Machine Intelligence, pages 1–14, 2025

  8. [8]

    Large language model unlearn- ing via embedding-corrupted prompts

    Chris Liu, Yaxuan Wang, Jeffrey Flanigan, and Yang Liu. Large language model unlearn- ing via embedding-corrupted prompts. Advances in Neural Information Processing Systems, 37:118198–118266, 2024

  9. [9]

    Ulmr: Unlearning large language models via negative response and model parameter average

    Shaojie Shi, Xiaoyu Tan, Xihe Qiu, Chao Qu, Kexin Nie, Yuan Cheng, Wei Chu, Xu Yinghui, and Yuan Qi. Ulmr: Unlearning large language models via negative response and model parameter average. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 755–762, 2024

  10. [10]

    LLM unlearning via loss adjustment with only forget data

    Yaxuan Wang, Jiaheng Wei, Chris Yuhao Liu, Jinlong Pang, Quan Liu, Ankit Shah, Yujia Bao, Yang Liu, and Wei Wei. LLM unlearning via loss adjustment with only forget data. In The Thirteenth International Conference on Learning Representations, 2025

  11. [11]

    Towards robust and parameter- efficient knowledge unlearning for LLMs

    Sungmin Cha, Sungjun Cho, Dasol Hwang, and Moontae Lee. Towards robust and parameter- efficient knowledge unlearning for LLMs. In The Thirteenth International Conference on Learning Representations, 2025

  12. [12]

    Reversing the forget-retain objectives: An efficient llm unlearning framework from logit difference

    Jiabao Ji, Yujian Liu, Yang Zhang, Gaowen Liu, Ramana Kompella, Sijia Liu, and Shiyu Chang. Reversing the forget-retain objectives: An efficient llm unlearning framework from logit difference. Advances in Neural Information Processing Systems, 37:12581–12611, 2024

  13. [13]

    Offset unlearning for large language models

    James Y Huang, Wenxuan Zhou, Fei Wang, Fred Morstatter, Sheng Zhang, Hoifung Poon, and Muhao Chen. Offset unlearning for large language models. arXiv preprint arXiv:2404.11045, 2024. 11

  14. [14]

    A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

    Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qiang- long Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43(2):1–55, 2025

  15. [15]

    Quantifying the uncertainty of llm hallucination spreading in complex adaptive social networks

    Guozhi Hao, Jun Wu, Qianqian Pan, and Rosario Morello. Quantifying the uncertainty of llm hallucination spreading in complex adaptive social networks. Scientific reports, 14(1):16375, 2024

  16. [16]

    Forget to flourish: Leveraging machine-unlearning on pretrained language models for privacy leakage

    Md Rafi Ur Rashid, Jing Liu, Toshiaki Koike-Akino, Ye Wang, and Shagufta Mehnaz. Forget to flourish: Leveraging machine-unlearning on pretrained language models for privacy leakage. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 20139– 20147, 2025

  17. [17]

    To forget or not? towards practical knowl- edge unlearning for large language models

    Bozhong Tian, Xiaozhuan Liang, Siyuan Cheng, Qingbin Liu, Mengru Wang, Dianbo Sui, Xi Chen, Huajun Chen, and Ningyu Zhang. To forget or not? towards practical knowl- edge unlearning for large language models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 1524–1537, 2024

  18. [18]

    A closer look at machine unlearning for large language models

    Xiaojian Yuan, Tianyu Pang, Chao Du, Kejiang Chen, Weiming Zhang, and Min Lin. A closer look at machine unlearning for large language models. In The Thirteenth International Conference on Learning Representations, 2025

  19. [19]

    Toy Models of Superposition

    Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, et al. Toy models of superposition. arXiv preprint arXiv:2209.10652, 2022

  20. [20]

    The rise of parameter specialization for knowledge storage in large language models

    Yihuai Hong, Yiran Zhao, Wei Tang, Yang Deng, Yu Rong, and Wenxuan Zhang. The rise of parameter specialization for knowledge storage in large language models. arXiv preprint arXiv:2505.17260, 2025

  21. [21]

    TOFU: A task of fictitious unlearning for LLMs

    Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary Chase Lipton, and J Zico Kolter. TOFU: A task of fictitious unlearning for LLMs. In First Conference on Language Modeling, 2024

  22. [22]

    Negative preference optimization: From catastrophic collapse to effective unlearning

    Ruiqi Zhang, Licong Lin, Yu Bai, and Song Mei. Negative preference optimization: From catastrophic collapse to effective unlearning. In First Conference on Language Modeling, 2024

  23. [23]

    Large language model unlearning

    Yuanshun Yao, Xiaojun Xu, and Yang Liu. Large language model unlearning. Advances in Neural Information Processing Systems, 37:105425–105475, 2024

  24. [24]

    Selective forgetting: Advancing machine unlearning techniques and evaluation in language models

    Lingzhi Wang, Xingshan Zeng, Jinsong Guo, Kam-Fai Wong, and Georg Gottlob. Selective forgetting: Advancing machine unlearning techniques and evaluation in language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 843–851, 2025

  25. [25]

    Eraser: Jailbreaking defense in large language models via unlearning harmful knowledge

    Weikai Lu, Ziqian Zeng, Jianwei Wang, Zhengdong Lu, Zelin Chen, Huiping Zhuang, and Cen Chen. Eraser: Jailbreaking defense in large language models via unlearning harmful knowledge. arXiv preprint arXiv:2404.05880, 2024

  26. [26]

    Towards safer large language models through machine unlearning

    Zheyuan Liu, Guangyao Dou, Zhaoxuan Tan, Yijun Tian, and Meng Jiang. Towards safer large language models through machine unlearning. InFindings of the Association for Computational Linguistics ACL 2024, pages 1817–1829, 2024

  27. [27]

    Meow: Memory supervised llm unlearning via inverted facts

    Tianle Gu, Kexin Huang, Ruilin Luo, Yuanqi Yao, Yujiu Yang, Yan Teng, and Yingchun Wang. Meow: Memory supervised llm unlearning via inverted facts. arXiv preprint arXiv:2409.11844, 2024

  28. [28]

    Who’s harry potter? approximate unlearning in llms

    Ronen Eldan and Mark Russinovich. Who’s harry potter? approximate unlearning in llms. arXiv e-prints, pages arXiv–2310, 2023

  29. [29]

    Safety alignment should be made more than just a few tokens deep

    Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu, Xiao Ma, Subhrajit Roy, Ahmad Beirami, Prateek Mittal, and Peter Henderson. Safety alignment should be made more than just a few tokens deep. In The Thirteenth International Conference on Learning Representations, 2025. 12

  30. [30]

    Soft prompting for unlearning in large language models

    Karuna Bhaila, Minh-Hao Van, and Xintao Wu. Soft prompting for unlearning in large language models. arXiv preprint arXiv:2406.12038, 2024

  31. [31]

    Shifting attention to relevance: Towards the predictive uncertainty quantification of free-form large language models

    Jinhao Duan, Hao Cheng, Shiqi Wang, Alex Zavalny, Chenan Wang, Renjing Xu, Bhavya Kailkhura, and Kaidi Xu. Shifting attention to relevance: Towards the predictive uncertainty quantification of free-form large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (V olume1: Long Papers), pages 5050–5063, 2024

  32. [32]

    Language Models (Mostly) Know What They Know

    Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, et al. Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221, 2022

  33. [33]

    The LAMBADA dataset: Word prediction requiring a broad discourse context

    Denis Paperno, Germán Kruszewski, Angeliki Lazaridou, Quan Ngoc Pham, Raffaella Bernardi, Sandro Pezzelle, Marco Baroni, Gemma Boleda, and Raquel Fernández. The lambada dataset: Word prediction requiring a broad discourse context. arXiv preprint arXiv:1606.06031, 2016

  34. [34]

    PubMedQA: A Dataset for Biomedical Research Question Answering

    Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William W Cohen, and Xinghua Lu. Pubmedqa: A dataset for biomedical research question answering. arXiv preprint arXiv:1909.06146, 2019

  35. [35]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023

  36. [36]

    Gpt-neo: Large scale autoregressive language modeling with mesh-tensorflow, 2021

    Sid Black, Leo Gao, Phil Wang, Connor Leahy, and Stella Biderman. Gpt-neo: Large scale autoregressive language modeling with mesh-tensorflow, 2021. 5297715, 2022

  37. [37]

    The Pile: An 800GB Dataset of Diverse Text for Language Modeling

    Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, et al. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020

  38. [38]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023

  39. [39]

    Vasquez”, “Lorenzo

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022. Supplemental Material A Theoretical Analysis of Attention Shifting A.1 Effect of Attention Reweighting on Output Distribution We begin by analyzing how modifying the attent...