KnowledgeDebugger -- an Exploration Tool for Knowledge Localization and Editing in Transformers

Artur Andrzejak; Eric Benz; Lennart St\"opler; Nikolai Bolik

arxiv: 2607.01000 · v1 · pith:EKXIHEZOnew · submitted 2026-07-01 · 💻 cs.CL

KnowledgeDebugger -- an Exploration Tool for Knowledge Localization and Editing in Transformers

Eric Benz , Lennart St\"opler , Nikolai Bolik , Artur Andrzejak This is my paper

Pith reviewed 2026-07-02 12:56 UTC · model grok-4.3

classification 💻 cs.CL

keywords knowledge editingtransformersGUI toolknowledge localizationmodel interpretabilityEasyEdit

0 comments

The pith

KnowledgeDebugger gives researchers a graphical interface to explore and edit knowledge inside Transformer models without code.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces KnowledgeDebugger to support the early exploratory phase of research on how Transformers store and process knowledge. It wraps the methods from the EasyEdit library in a no-code GUI inspired by LM-Debugger, letting users localize and modify specific facts on individual samples. The authors demonstrate the approach through case studies that reproduce recent findings in knowledge editing. This matters because it lowers the barrier between idea and test when deciding whether a phenomenon is worth larger-scale statistical experiments.

Core claim

We propose KnowledgeDebugger, a GUI-based exploration tool for knowledge localization and editing in Transformers. Our tool offers no-code access to the methods in EasyEdit, a widely used library of state-of-the-art Knowledge Editing approaches, and we demonstrate the tool's effectiveness through case studies of recent findings in this field.

What carries the argument

The GUI interface that integrates EasyEdit's knowledge editing methods to allow interactive localization and modification of model knowledge on single examples.

If this is right

Individual-sample experiments on knowledge editing can be run without writing code.
Promising editing behaviors identified on single cases can be more quickly selected for follow-up statistical validation.
Researchers without programming expertise gain direct access to current knowledge-editing techniques.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Widespread use could shorten the cycle from hypothesis to larger experiment in knowledge-editing work.
The same GUI pattern could be applied to other model-editing or interpretability libraries beyond EasyEdit.

Load-bearing premise

Providing a GUI wrapper around existing EasyEdit methods plus a handful of case studies is enough to show the tool effectively aids the exploratory research phase.

What would settle it

A controlled comparison in which researchers attempt the same knowledge-localization task with and without the GUI and measure differences in time to insight or number of hypotheses tested.

Figures

Figures reproduced from arXiv: 2607.01000 by Artur Andrzejak, Eric Benz, Lennart St\"opler, Nikolai Bolik.

**Figure 2.** Figure 2: We apply a ROME update for the knowledge triplet ("Madagaskar", "The capital of {} is", "Berlin") at [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

read the original abstract

Recent research has increasingly focused on understanding how Transformers store and process knowledge, as well as how this knowledge can be edited. Research work in this area is often conducted in two phases: first, phenomena are explored on individual samples. Then, when results appear promising, more statistically robust experiments follow. To support the first phase, we propose KnowledgeDebugger, a GUI-based exploration tool for knowledge localization and editing in Transformers. Our tool - inspired by LM-Debugger - offers no-code access to the methods in EasyEdit, a widely used library of state-of-the-art Knowledge Editing approaches. We demonstrate the tool's effectiveness through case studies of recent findings in this field.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

KnowledgeDebugger is a GUI wrapper around EasyEdit that makes existing editing methods more accessible for quick checks but adds no new techniques or evidence of better results.

read the letter

This paper presents KnowledgeDebugger, a GUI tool that gives no-code access to knowledge localization and editing methods from the EasyEdit library, inspired by LM-Debugger. The core contribution is packaging these existing approaches into an interface for quick sample-level exploration.

It does a solid job of lowering the entry barrier for researchers who want to test editing ideas on individual instances before moving to larger experiments. The case studies illustrate how it can be used to reproduce some recent findings in the area.

The soft spots come in the evaluation. The paper relies on qualitative case studies without any usability metrics, user studies, or comparisons to the original EasyEdit command-line interface or LM-Debugger. This means we don't know if the GUI actually helps users find insights faster or more reliably. The claim that it effectively supports the exploratory phase is plausible but untested.

Citations look appropriate and point back to the source libraries without overclaiming novelty. There are no new mathematical results or data analyses to scrutinize.

This work is mainly for people already working on knowledge editing in transformers who value visual tools for initial tinkering. A reader interested in new algorithms or strong empirical validation will find little to engage with.

I would not bring this to a reading group. I would not cite it. For peer review, it does not seem to merit sending out to referees for a standard research paper; it might fit better as a system demonstration or short tool note if the venue allows that.

Referee Report

1 major / 0 minor

Summary. The manuscript introduces KnowledgeDebugger, a GUI-based exploration tool for knowledge localization and editing in Transformer models. Inspired by LM-Debugger, it provides no-code access to methods from the EasyEdit library and demonstrates its use through case studies of recent findings in the field, with the goal of supporting the initial exploratory phase of research before larger-scale experiments.

Significance. If the tool demonstrably facilitates hypothesis generation on individual samples, it could accelerate research in knowledge editing by lowering the barrier to using state-of-the-art methods from EasyEdit. The explicit focus on the exploratory phase and reuse of an established library are positive aspects that align with practical needs in the field.

major comments (1)

[Case studies] Case studies section: The central claim that the tool provides 'effective support' for the exploratory research phase rests on qualitative case studies of recent findings, but no quantitative measures (e.g., edit success rates, task completion time, number of insights generated, or controlled comparison to code-based EasyEdit or LM-Debugger) are reported. This absence is load-bearing for the effectiveness assertion.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below.

read point-by-point responses

Referee: [Case studies] Case studies section: The central claim that the tool provides 'effective support' for the exploratory research phase rests on qualitative case studies of recent findings, but no quantitative measures (e.g., edit success rates, task completion time, number of insights generated, or controlled comparison to code-based EasyEdit or LM-Debugger) are reported. This absence is load-bearing for the effectiveness assertion.

Authors: We agree that the case studies are qualitative and provide no quantitative metrics, user studies, or controlled comparisons. The manuscript frames the tool as support for the initial exploratory phase on individual samples (prior to larger-scale experiments) and uses the case studies to show how the GUI enables replication and exploration of recent findings via EasyEdit methods. This is consistent with the paper's stated goal of lowering the barrier for the exploratory phase rather than conducting an empirical evaluation of research acceleration. We will make a partial revision by updating the abstract, introduction, and conclusion to replace 'demonstrate the tool's effectiveness' with 'illustrate the tool's utility' (and similarly adjust related phrasing) so that the claim more accurately reflects the presented evidence. revision: partial

Circularity Check

0 steps flagged

Tool description paper contains no derivations, fitted parameters, or self-referential claims

full rationale

The manuscript is a description of a GUI tool that integrates existing external libraries (EasyEdit, inspired by LM-Debugger) and illustrates usage via qualitative case studies. No equations, parameters, uniqueness theorems, or derivations appear anywhere in the provided text. The central claim reduces to the statement that a no-code interface plus demos supports exploratory work; this is an empirical claim about usability that is not derived from any internal construction or self-citation chain. No load-bearing step reduces to its own inputs by definition or by renaming. The paper is therefore self-contained against external benchmarks with a circularity score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper describes a software tool rather than a mathematical or empirical claim; no free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.1-grok · 5641 in / 975 out tokens · 28292 ms · 2026-07-02T12:56:43.924846+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 4 canonical work pages · 1 internal anchor

[1]

2022 , doi =

Geva, Mor and Caciularu, Avi and Dar, Guy and Roit, Paul and Sadde, Shoval and Shlain, Micah and Tamir, Bar and Goldberg, Yoav , booktitle =. 2022 , doi =

2022
[2]

Easyedit: An easy-to-use knowledge editing framework for large language models , year =

Wang, Peng and Zhang, Ningyu and Xie, Xin and Yao, Yunzhi and Tian, Bozhong and Wang, Mengru and Xi, Zekun and Cheng, Siyuan and Liu, Kangwei and Zheng, Guozhou and others , journal =. Easyedit: An easy-to-use knowledge editing framework for large language models , year =
[3]

Transformer Feed-Forward Layers Are Key-Value Memories

Geva, Mor and Schuster, Roei and Berant, Jonathan and Levy, Omer , booktitle =. Transformer Feed-Forward Layers Are Key-Value Memories , year =. doi:10.18653/v1/2021.emnlp-main.446 , url =

work page internal anchor Pith review doi:10.18653/v1/2021.emnlp-main.446 2021
[4]

2022 , url =

Nelson Elhage and Tristan Hume and Catherine Olsson and Nicholas Schiefer and Tom Henighan and Shauna Kravec and Zac Hatfield-Dodds and Robert Lasenby and Dawn Drain and Carol Chen and Roger Grosse and Sam McCandlish and Jared Kaplan and Dario Amodei and Martin Wattenberg and Christopher Olah , title =. 2022 , url =

2022
[5]

Lindsey, Jack and Gurnee, Wes and Ameisen, Emmanuel and Chen, Brian and Pearce, Adam and Turner, Nicholas L. and Citro, Craig and Abrahams, David and Carter, Shan and Hosmer, Basil and Marcus, Jonathan and Sklar, Michael and Templeton, Adly and Bricken, Trenton and McDougall, Callum and Cunningham, Hoagy and Henighan, Thomas and Jermyn, Adam and Jones, An...
[6]

2023 , url =

Kevin Meng and David Bau and Alex Andonian and Yonatan Belinkov , title =. 2023 , url =

2023
[7]

and Palangi, Hamid and Kim, Yoon and Ghassemi, M

Hartvigsen, Thomas and Sankaranarayanan, S. and Palangi, Hamid and Kim, Yoon and Ghassemi, M. , journal =. Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors , year =
[8]

Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs , year =

Pan, Haowen and Wang, Xiaozhi and Cao, Yixin and Shi, Zenglin and Yang, Xun and Li, Juanzi and Wang, Meng , journal =. Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs , year =
[9]

and Kaiser,

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser,. Attention is all you need , year =. Proceedings of the 31st International Conference on Neural Information Processing Systems , pages =
[10]

A Mathematical Framework for Transformer Circuits , year =

Elhage, Nelson and Nanda, Neel and Olsson, Catherine and Henighan, Tom and Joseph, Nicholas and Mann, Ben and Askell, Amanda and Bai, Yuntao and Chen, Anna and Conerly, Tom and DasSarma, Nova and Drain, Dawn and Ganguli, Deep and Hatfield-Dodds, Zac and Hernandez, Danny and Jones, Andy and Kernion, Jackson and Lovitt, Liane and Ndousse, Kamal and Amodei, ...
[11]

Interpreting GPT: The logit lens , year =

nostalgebraist , journal =. Interpreting GPT: The logit lens , year =
[12]

Transcoders find interpretable LLM feature circuits , year =

Dunefsky, Jacob and Chlenski, Philippe and Nanda, Neel , booktitle =. Transcoders find interpretable LLM feature circuits , year =
[13]

Editing Factual Knowledge in Language Models , year =

Cao, Nicola De and Aziz, Wilker and Titov, Ivan , journal =. Editing Factual Knowledge in Language Models , year =
[14]

and Bosselut, Antoine and Finn, Chelsea and Manning, Christopher D

Mitchell, Eric and Lin, Charles P. and Bosselut, Antoine and Finn, Chelsea and Manning, Christopher D. , journal =. Fast Model Editing at Scale , year =
[15]

Calibrating Factual Knowledge in Pretrained Language Models , year =

Dong, Qingxiu and Dai, Damai and Song, Yifan and Xu, Jingjing and Sui, Zhifang and Li, Lei , journal =. Calibrating Factual Knowledge in Pretrained Language Models , year =
[16]

MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA , year =

Yu, Lang and Chen, Qin and Zhou, Jie and He, Liang , journal =. MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA , year =
[17]

WISE: rethinking the knowledge memory for lifelong model editing of large language models , year =

Wang, Peng and Li, Zexi and Zhang, Ningyu and Xu, Ziwen and Yao, Yunzhi and Jiang, Yong and Xie, Pengjun and Huang, Fei and Chen, Huajun , booktitle =. WISE: rethinking the knowledge memory for lifelong model editing of large language models , year =
[18]

2023 , url =

Kevin Meng and Arnab Sen Sharma and Alex Andonian and Yonatan Belinkov and David Bau , title =. 2023 , url =

2023
[19]

2024 , url =

Xiaopeng Li and Shasha Li and Shezheng Song and Jing Yang and Jun Ma and Jie Yu , title =. 2024 , url =

2024
[20]

2025 , url =

Junfeng Fang and Houcheng Jiang and Kun Wang and Yunshan Ma and Shi Jie and Xiang Wang and Xiangnan He and Tat-seng Chua , title =. 2025 , url =

2025
[21]

2024 , url =

Akshat Gupta and Sidharth Baskaran and Gopala Anumanchipalli , title =. 2024 , url =

2024
[22]

The Fall of

Yang, Wanli and Sun, Fei and Tan, Jiajun and Ma, Xinyu and Su, Du and Yin, Dawei and Shen, Huawei , booktitle =. The Fall of. 2024 , doi =

2024
[23]

The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse , year =

Yang, Wanli and Sun, Fei and Ma, Xinyu and Liu, Xun and Yin, Dawei and Cheng, Xueqi , booktitle =. The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse , year =. doi:10.18653/v1/2024.findings-acl.322 , url =

work page doi:10.18653/v1/2024.findings-acl.322 2024
[24]

Language models are unsupervised multitask learners , volume =

Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya and others , journal =. Language models are unsupervised multitask learners , volume =
[25]

Evaluating the Ripple Effects of Knowledge Editing in Language Models , volume =

Cohen, Roi and Biran, Eden and Yoran, Ori and Globerson, Amir and Geva, Mor , journal =. Evaluating the Ripple Effects of Knowledge Editing in Language Models , volume =. 2024 , doi =

2024
[26]

Model Editing at Scale leads to Gradual and Catastrophic Forgetting , year =

Gupta, Akshat and Rao, Anurag and Anumanchipalli, Gopala , booktitle =. Model Editing at Scale leads to Gradual and Catastrophic Forgetting , year =. doi:10.18653/v1/2024.findings-acl.902 , url =

work page doi:10.18653/v1/2024.findings-acl.902 2024
[27]

An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2 , year =

Neel Nanda , howpublished =. An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2 , year =
[28]

Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark , year =

Hoelscher-Obermaier, Jason and Persson, Julia and Kran, Esben and Konstas, Ioannis and Barez, Fazl , booktitle =. Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark , year =. doi:10.18653/v1/2023.findings-acl.733 , url =

work page doi:10.18653/v1/2023.findings-acl.733 2023
[29]

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs

Hase, Peter and Bansal, Mohit and Kim, Been and Ghandeharioun, Asma , booktitle =. Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models , volume =. 2023 , url =

2023

[1] [1]

2022 , doi =

Geva, Mor and Caciularu, Avi and Dar, Guy and Roit, Paul and Sadde, Shoval and Shlain, Micah and Tamir, Bar and Goldberg, Yoav , booktitle =. 2022 , doi =

2022

[2] [2]

Easyedit: An easy-to-use knowledge editing framework for large language models , year =

Wang, Peng and Zhang, Ningyu and Xie, Xin and Yao, Yunzhi and Tian, Bozhong and Wang, Mengru and Xi, Zekun and Cheng, Siyuan and Liu, Kangwei and Zheng, Guozhou and others , journal =. Easyedit: An easy-to-use knowledge editing framework for large language models , year =

[3] [3]

Transformer Feed-Forward Layers Are Key-Value Memories

Geva, Mor and Schuster, Roei and Berant, Jonathan and Levy, Omer , booktitle =. Transformer Feed-Forward Layers Are Key-Value Memories , year =. doi:10.18653/v1/2021.emnlp-main.446 , url =

work page internal anchor Pith review doi:10.18653/v1/2021.emnlp-main.446 2021

[4] [4]

2022 , url =

Nelson Elhage and Tristan Hume and Catherine Olsson and Nicholas Schiefer and Tom Henighan and Shauna Kravec and Zac Hatfield-Dodds and Robert Lasenby and Dawn Drain and Carol Chen and Roger Grosse and Sam McCandlish and Jared Kaplan and Dario Amodei and Martin Wattenberg and Christopher Olah , title =. 2022 , url =

2022

[5] [5]

Lindsey, Jack and Gurnee, Wes and Ameisen, Emmanuel and Chen, Brian and Pearce, Adam and Turner, Nicholas L. and Citro, Craig and Abrahams, David and Carter, Shan and Hosmer, Basil and Marcus, Jonathan and Sklar, Michael and Templeton, Adly and Bricken, Trenton and McDougall, Callum and Cunningham, Hoagy and Henighan, Thomas and Jermyn, Adam and Jones, An...

[6] [6]

2023 , url =

Kevin Meng and David Bau and Alex Andonian and Yonatan Belinkov , title =. 2023 , url =

2023

[7] [7]

and Palangi, Hamid and Kim, Yoon and Ghassemi, M

Hartvigsen, Thomas and Sankaranarayanan, S. and Palangi, Hamid and Kim, Yoon and Ghassemi, M. , journal =. Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors , year =

[8] [8]

Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs , year =

Pan, Haowen and Wang, Xiaozhi and Cao, Yixin and Shi, Zenglin and Yang, Xun and Li, Juanzi and Wang, Meng , journal =. Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs , year =

[9] [9]

and Kaiser,

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser,. Attention is all you need , year =. Proceedings of the 31st International Conference on Neural Information Processing Systems , pages =

[10] [10]

A Mathematical Framework for Transformer Circuits , year =

Elhage, Nelson and Nanda, Neel and Olsson, Catherine and Henighan, Tom and Joseph, Nicholas and Mann, Ben and Askell, Amanda and Bai, Yuntao and Chen, Anna and Conerly, Tom and DasSarma, Nova and Drain, Dawn and Ganguli, Deep and Hatfield-Dodds, Zac and Hernandez, Danny and Jones, Andy and Kernion, Jackson and Lovitt, Liane and Ndousse, Kamal and Amodei, ...

[11] [11]

Interpreting GPT: The logit lens , year =

nostalgebraist , journal =. Interpreting GPT: The logit lens , year =

[12] [12]

Transcoders find interpretable LLM feature circuits , year =

Dunefsky, Jacob and Chlenski, Philippe and Nanda, Neel , booktitle =. Transcoders find interpretable LLM feature circuits , year =

[13] [13]

Editing Factual Knowledge in Language Models , year =

Cao, Nicola De and Aziz, Wilker and Titov, Ivan , journal =. Editing Factual Knowledge in Language Models , year =

[14] [14]

and Bosselut, Antoine and Finn, Chelsea and Manning, Christopher D

Mitchell, Eric and Lin, Charles P. and Bosselut, Antoine and Finn, Chelsea and Manning, Christopher D. , journal =. Fast Model Editing at Scale , year =

[15] [15]

Calibrating Factual Knowledge in Pretrained Language Models , year =

Dong, Qingxiu and Dai, Damai and Song, Yifan and Xu, Jingjing and Sui, Zhifang and Li, Lei , journal =. Calibrating Factual Knowledge in Pretrained Language Models , year =

[16] [16]

MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA , year =

Yu, Lang and Chen, Qin and Zhou, Jie and He, Liang , journal =. MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA , year =

[17] [17]

WISE: rethinking the knowledge memory for lifelong model editing of large language models , year =

Wang, Peng and Li, Zexi and Zhang, Ningyu and Xu, Ziwen and Yao, Yunzhi and Jiang, Yong and Xie, Pengjun and Huang, Fei and Chen, Huajun , booktitle =. WISE: rethinking the knowledge memory for lifelong model editing of large language models , year =

[18] [18]

2023 , url =

Kevin Meng and Arnab Sen Sharma and Alex Andonian and Yonatan Belinkov and David Bau , title =. 2023 , url =

2023

[19] [19]

2024 , url =

Xiaopeng Li and Shasha Li and Shezheng Song and Jing Yang and Jun Ma and Jie Yu , title =. 2024 , url =

2024

[20] [20]

2025 , url =

Junfeng Fang and Houcheng Jiang and Kun Wang and Yunshan Ma and Shi Jie and Xiang Wang and Xiangnan He and Tat-seng Chua , title =. 2025 , url =

2025

[21] [21]

2024 , url =

Akshat Gupta and Sidharth Baskaran and Gopala Anumanchipalli , title =. 2024 , url =

2024

[22] [22]

The Fall of

Yang, Wanli and Sun, Fei and Tan, Jiajun and Ma, Xinyu and Su, Du and Yin, Dawei and Shen, Huawei , booktitle =. The Fall of. 2024 , doi =

2024

[23] [23]

The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse , year =

Yang, Wanli and Sun, Fei and Ma, Xinyu and Liu, Xun and Yin, Dawei and Cheng, Xueqi , booktitle =. The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse , year =. doi:10.18653/v1/2024.findings-acl.322 , url =

work page doi:10.18653/v1/2024.findings-acl.322 2024

[24] [24]

Language models are unsupervised multitask learners , volume =

Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya and others , journal =. Language models are unsupervised multitask learners , volume =

[25] [25]

Evaluating the Ripple Effects of Knowledge Editing in Language Models , volume =

Cohen, Roi and Biran, Eden and Yoran, Ori and Globerson, Amir and Geva, Mor , journal =. Evaluating the Ripple Effects of Knowledge Editing in Language Models , volume =. 2024 , doi =

2024

[26] [26]

Model Editing at Scale leads to Gradual and Catastrophic Forgetting , year =

Gupta, Akshat and Rao, Anurag and Anumanchipalli, Gopala , booktitle =. Model Editing at Scale leads to Gradual and Catastrophic Forgetting , year =. doi:10.18653/v1/2024.findings-acl.902 , url =

work page doi:10.18653/v1/2024.findings-acl.902 2024

[27] [27]

An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2 , year =

Neel Nanda , howpublished =. An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2 , year =

[28] [28]

Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark , year =

Hoelscher-Obermaier, Jason and Persson, Julia and Kran, Esben and Konstas, Ioannis and Barez, Fazl , booktitle =. Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark , year =. doi:10.18653/v1/2023.findings-acl.733 , url =

work page doi:10.18653/v1/2023.findings-acl.733 2023

[29] [29]

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs

Hase, Peter and Bansal, Mohit and Kim, Been and Ghandeharioun, Asma , booktitle =. Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models , volume =. 2023 , url =

2023