pith. machine review for the scientific record. sign in

arxiv: 2605.00358 · v1 · submitted 2026-05-01 · 💻 cs.CL · cs.CV

Recognition: unknown

From Backward Spreading to Forward Replay: Revisiting Target Construction in LLM Parameter Editing

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:39 UTC · model grok-4.3

classification 💻 cs.CL cs.CV
keywords LLM parameter editingtarget constructionbackward spreadingforward propagationanchor pointhidden statesmodel editing
0
0 comments X

The pith

Optimizing the anchor at the first editing layer and propagating forward yields more accurate targets for all layers than backward spreading.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current LLM parameter editing methods compute an ideal hidden-state target at a chosen anchor layer and then spread adjustments backward to earlier layers. This paper examines the foundations of that backward spreading process and identifies its practical boundaries and failure modes. The authors propose a direct alternative: optimize the anchor point at the first editing layer instead, then propagate the resulting target forward through the remaining layers. This produces layer-wise targets that are both more accurate and mutually compatible while using exactly the same amount of computation. The change requires no modifications to the initial target calculation or to any other part of the editing pipeline.

Core claim

Instead of optimizing the target hidden-state at the last editing layer and spreading it backward, the method optimizes the anchor at the first editing layer and then propagates it forward, automatically generating accurate and mutually compatible target hidden-states for every subsequent layer at the same computational cost.

What carries the argument

Forward propagation of the anchor point optimized at the first editing layer, which replaces backward spreading from the last layer.

If this is right

  • The method achieves identical computational complexity to existing backward-spreading techniques.
  • Layer-wise targets are more accurate and mutually compatible across edited layers.
  • The approach integrates without changing the initial target computation or any other pipeline components.
  • The same forward-propagation construction can be applied to a wide range of existing LLM parameter editing methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Multi-layer edits may accumulate fewer inconsistencies because targets are generated sequentially from the same starting point rather than adjusted retroactively.
  • The change could simplify debugging of editing failures by making the relationship between layers more transparent.
  • Similar forward-construction logic might be tested on sequential models outside the LLM setting where hidden-state targets must be defined across layers.

Load-bearing premise

That optimizing the anchor at the first layer and propagating it forward will automatically produce mutually compatible and accurate targets for all later layers without any further adjustments.

What would settle it

An experiment that measures editing success rates or downstream task performance when using forward-propagated targets versus backward-spread targets on the same models and benchmarks, with forward propagation showing clearly worse results.

Figures

Figures reproduced from arXiv: 2605.00358 by Hongkai Liu, Wee Sun Lee, Wei Liu, Yee Whye Teh, Zhiying Deng.

Figure 1
Figure 1. Figure 1: (a) A toy example of the backward spreading of ideal hidden states (i.e., m) in model editing. ① Getting the target hidden state of the final decisive layer by taking it as optimizable parameters and minimizing cross-entropy. ② Getting all the target hidden states of the decisive layers with backward spreading. ③ Using the target hidden states to guide the parameter editing. post hoc. Recent work shows tha… view at source ↗
Figure 2
Figure 2. Figure 2: (a) A toy example of our method: forward replay of all hidden states (i.e., m) in model editing. ① Finding m1 with gradient descent. ② Picking up the hidden states stored in the back-propagation path with forward-propagation replay. ③ Editing the model with the target hidden states. a single residual direction (e.g., m3 − h3 in view at source ↗
Figure 3
Figure 3. Figure 3: The steering step should be small if θ is large. passive shift induced at the final layer L (i.e., δ l mL ) after applying steering at layer l (i.e., δml ). The results are in view at source ↗
read the original abstract

LLM parameter editing methods commonly rely on computing an ideal target hidden-state at a target layer (referred as anchor point) and distributing the target vector to multiple preceding layers (commonly known as backward spreading) for cooperative editing. Although widely used for a long time, its underlying basis have not been systematically investigated. In this paper, we first conduct a systematic study of its foundations, which helps clarify its capability boundaries, practical considerations, and potential failure modes. Then, we propose a simple and elegant alternative that replaces backward spreading with forward-propagation. Instead of optimizing the target at the last editing layer, we optimize the anchor point at the first editing layer, and then propagate it forward to obtain accurate and mutually compatible target hidden-states for all subsequent editing layers. This approach achieves the same computational complexity as existing methods while producing more accurate layer-wise targets. Our method is simple, without interfering with either the computation of the initial target hidden state or any other components of the subsequent editing pipeline, and thus constituting a benefit for a wide range of LLM parameter editing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript first performs a systematic analysis of the foundations, capability boundaries, and failure modes of backward spreading for constructing target hidden states in LLM parameter editing. It then proposes forward replay as an alternative: the anchor hidden state is optimized at the first editing layer so that the original forward pass reaches the desired output, after which the original layer transformations are applied to generate targets for all subsequent editing layers. The authors claim this construction yields more accurate and mutually compatible layer-wise targets at the same computational complexity as backward spreading, without requiring changes to the initial target computation or other pipeline components.

Significance. The systematic study of backward spreading provides a useful clarification of its practical limits. If the forward-replay targets are indeed more accurate and remain compatible under joint multi-layer edits, the method would offer a lightweight, non-disruptive improvement that could be adopted across many existing LLM editing algorithms, potentially raising their reliability on knowledge-editing and model-update benchmarks without added cost.

major comments (2)
  1. [Abstract] Abstract: the central claim that forward replay 'produces more accurate layer-wise targets' and 'mutually compatible' targets is stated without any quantitative comparison, error analysis, or ablation against backward spreading; the soundness of the improvement therefore rests on an unverified assertion.
  2. [Method section] Method section (forward-replay construction): the proposal optimizes the anchor at the first editing layer and propagates using the original inter-layer maps. When multiple layers are edited simultaneously, however, the edited model's hidden-state trajectory deviates from the original trajectory, so the pre-computed forward targets are no longer guaranteed to be the fixed points required by the joint editing objective. No analysis, proof, or counter-example addressing this compatibility risk is supplied.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and recommendation for major revision. We address each major comment below, agreeing that the claims require stronger quantitative support and explicit analysis of multi-layer compatibility. We will incorporate the necessary additions in the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that forward replay 'produces more accurate layer-wise targets' and 'mutually compatible' targets is stated without any quantitative comparison, error analysis, or ablation against backward spreading; the soundness of the improvement therefore rests on an unverified assertion.

    Authors: We agree that the abstract presents the accuracy and compatibility claims without direct quantitative backing. The manuscript's core contribution is the systematic analysis of backward spreading's foundations, boundaries, and failure modes, which motivates forward replay as an alternative that avoids those issues by construction. To address the concern, we will revise the abstract for precision and add quantitative comparisons (e.g., layer-wise reconstruction error relative to the original forward pass) plus ablation studies on editing benchmarks in a new results subsection. These will directly validate the claims against backward spreading at equivalent cost. revision: yes

  2. Referee: [Method section] Method section (forward-replay construction): the proposal optimizes the anchor at the first editing layer and propagates using the original inter-layer maps. When multiple layers are edited simultaneously, however, the edited model's hidden-state trajectory deviates from the original trajectory, so the pre-computed forward targets are no longer guaranteed to be the fixed points required by the joint editing objective. No analysis, proof, or counter-example addressing this compatibility risk is supplied.

    Authors: This highlights a valid subtlety in joint multi-layer editing. Forward replay constructs targets by optimizing the anchor at the first layer and propagating via the original transformations, ensuring consistency with the unmodified forward trajectory from that anchor. While simultaneous edits will alter the actual trajectory, the targets remain mutually compatible as they derive from a single optimized starting point rather than independent backward computations. We will expand the method section with a dedicated discussion of this approximation, including conditions for validity and empirical counter-examples on multi-layer edits, to clarify the risk without altering the core algorithm. revision: yes

Circularity Check

0 steps flagged

No circularity: forward-propagation proposal is an independent construction

full rationale

The paper first analyzes the foundations and failure modes of backward spreading, then introduces forward replay as a direct alternative: optimize the anchor hidden state at the earliest editing layer and propagate it forward using the original layer transformations to obtain targets for later layers. This is explicitly framed as a simple substitution that preserves computational complexity, does not alter the initial target computation or other pipeline components, and yields 'more accurate and mutually compatible' targets by construction of the forward pass. No equations reduce the claimed accuracy or compatibility to a fitted parameter drawn from the same data used for evaluation, nor does any step rename a known result or import a uniqueness theorem via self-citation. The central modeling choice (that original inter-layer maps remain useful for target construction) is a substantive assumption open to empirical test rather than a self-definitional loop. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The proposal rests on the unstated premise that forward propagation from an optimized first-layer anchor yields compatible targets; no free parameters, invented entities, or additional axioms are described in the abstract.

axioms (1)
  • domain assumption Forward propagation from an optimized anchor at the first editing layer produces mutually compatible and more accurate targets for subsequent layers.
    This is the core premise required for the claimed accuracy improvement; it is invoked when the abstract states the new method produces better targets.

pith-pipeline@v0.9.0 · 5495 in / 1173 out tokens · 36125 ms · 2026-05-09T19:39:52.728423+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

54 extracted references · 18 canonical work pages

  1. [1]

    Locating and Editing Factual Associations in

    Kevin Meng and David Bau and Alex Andonian and Yonatan Belinkov , editor =. Locating and Editing Factual Associations in. Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022 , year =

  2. [2]

    Andonian and Yonatan Belinkov and David Bau , title =

    Kevin Meng and Arnab Sen Sharma and Alex J. Andonian and Yonatan Belinkov and David Bau , title =. The Eleventh International Conference on Learning Representations,. 2023 , url =

  3. [3]

    AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models , booktitle =

    Junfeng Fang and Houcheng Jiang and Kun Wang and Yunshan Ma and Jie Shi and Xiang Wang and Xiangnan He and Tat. AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models , booktitle =. 2025 , url =

  4. [4]

    The Eleventh International Conference on Learning Representations , year=

    Transformer-Patcher: One Mistake Worth One Neuron , author=. The Eleventh International Conference on Learning Representations , year=

  5. [5]

    Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

    Calibrating Factual Knowledge in Pretrained Language Models , author=. Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

  6. [6]

    International Conference on Machine Learning , pages=

    Memory-based model editing at scale , author=. International Conference on Machine Learning , pages=. 2022 , organization=

  7. [7]

    Advances in Neural Information Processing Systems , volume=

    Aging with grace: Lifelong model editing with discrete key-value adaptors , author=. Advances in Neural Information Processing Systems , volume=

  8. [8]

    Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

    Can We Edit Factual Knowledge by In-Context Learning? , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

  9. [9]

    Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence , pages=

    InstructEdit: instruction-based knowledge editing for large language models , author=. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence , pages=

  10. [10]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Learning to Edit: Aligning LLMs with Knowledge Editing , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

  11. [11]

    Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

    A Unified Framework for Model Editing , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

  12. [12]

    Manning , title =

    Eric Mitchell and Charles Lin and Antoine Bosselut and Chelsea Finn and Christopher D. Manning , title =. The Tenth International Conference on Learning Representations,. 2022 , url =

  13. [13]

    Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

    Editing Factual Knowledge in Language Models , author=. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

  14. [14]

    Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),

    Wanli Yang and Fei Sun and Jiajun Tan and Xinyu Ma and Qi Cao and Dawei Yin and Huawei Shen and Xueqi Cheng , title =. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),. 2025 , url =

  15. [15]

    Zero-shot relation extraction via reading comprehension

    Omer Levy and Minjoon Seo and Eunsol Choi and Luke Zettlemoyer , editor =. Zero-Shot Relation Extraction via Reading Comprehension , booktitle =. 2017 , url =. doi:10.18653/V1/K17-1034 , timestamp =

  16. [16]

    International Conference on Machine Learning , pages=

    Physics of Language Models: Part 3.1, Knowledge Storage and Extraction , author=. International Conference on Machine Learning , pages=. 2024 , organization=

  17. [17]

    Advances in neural information processing systems , volume=

    How do large language models acquire factual knowledge during pretraining? , author=. Advances in neural information processing systems , volume=

  18. [18]

    Unifying large language models and knowledge graphs: a roadmap

    Shirui Pan and Linhao Luo and Yufei Wang and Chen Chen and Jiapu Wang and Xindong Wu , title =. 2024 , url =. doi:10.1109/TKDE.2024.3352100 , timestamp =

  19. [19]

    Ibrahim and Rasha F

    Nourhan Ibrahim and Samar AboulEla and Ahmed F. Ibrahim and Rasha F. Kashef , title =. Discov. Artif. Intell. , volume =. 2024 , url =. doi:10.1007/S44163-024-00175-8 , timestamp =

  20. [20]

    Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

    Editing Large Language Models: Problems, Methods, and Opportunities , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

  21. [21]

    Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),

    Zilu Dong and Xiangqing Shen and Zinong Yang and Rui Xia , title =. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),

  22. [22]

    The Thirteenth International Conference on Learning Representations,

    Mengqi Zhang and Xiaotian Ye and Qiang Liu and Shu Wu and Pengjie Ren and Zhumin Chen , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =

  23. [23]

    Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),

    Jiakuan Xie and Pengfei Cao and Yubo Chen and Kang Liu and Jun Zhao , title =. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),. 2025 , url =

  24. [24]

    Thirty-Eighth

    Xiaopeng Li and Shasha Li and Shezheng Song and Jing Yang and Jun Ma and Jie Yu , title =. Thirty-Eighth. 2024 , url =. doi:10.1609/AAAI.V38I17.29818 , timestamp =

  25. [25]

    Perturbation-Restrained Sequential Model Editing , booktitle =

    Jun. Perturbation-Restrained Sequential Model Editing , booktitle =. 2025 , url =

  26. [26]

    A da E dit: Advancing Continuous Knowledge Editing For Large Language Models

    Li, Qi and Chu, Xiaowen. A da E dit: Advancing Continuous Knowledge Editing For Large Language Models. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.208

  27. [27]

    CoRR , volume =

    Yanbo Dai and Zhenlan Ji and Zongjie Li and Shuai Wang , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2505.11876 , eprinttype =. 2505.11876 , timestamp =

  28. [28]

    Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue , booktitle =

    Jia. Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue , booktitle =. 2024 , url =. doi:10.18653/V1/2024.EMNLP-MAIN.934 , timestamp =

  29. [29]

    virtual samples

    Zexuan Zhong and Zhengxuan Wu and Christopher D. Manning and Christopher Potts and Danqi Chen , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,. 2023 , url =. doi:10.18653/V1/2023.EMNLP-MAIN.971 , timestamp =

  30. [30]

    A comprehensive study of knowledge editing for large language models.arXiv preprint arXiv:2401.01286,

    Ningyu Zhang and Yunzhi Yao and Bozhong Tian and Peng Wang and Shumin Deng and Mengru Wang and Zekun Xi and Shengyu Mao and Jintian Zhang and Yuansheng Ni and Siyuan Cheng and Ziwen Xu and Xin Xu and Jia. A Comprehensive Study of Knowledge Editing for Large Language Models , journal =. 2024 , url =. doi:10.48550/ARXIV.2401.01286 , eprinttype =. 2401.01286...

  31. [31]

    SeqMMR: Sequential Model Merging and

    Shanbao Qiao and Xuebing Liu and Akshat Gupta and Seung. SeqMMR: Sequential Model Merging and. Findings of the Association for Computational Linguistics,. 2025 , url =

  32. [32]

    Parameter-Aware Contrastive Knowledge Editing: Tracing and Rectifying based on Critical Transmission Paths

    Zhai, Songlin and Meng, Yuan and Zhang, Yuxin and Qi, Guilin. Parameter-Aware Contrastive Knowledge Editing: Tracing and Rectifying based on Critical Transmission Paths. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.1367

  33. [33]

    Findings of the Association for Computational Linguistics,

    Haewon Park and Gyubin Choi and Minjun Kim and Yohan Jo , title =. Findings of the Association for Computational Linguistics,. 2025 , url =

  34. [34]

    Peng Wang and Zexi Li and Ningyu Zhang and Ziwen Xu and Yunzhi Yao and Yong Jiang and Pengjun Xie and Fei Huang and Huajun Chen , title =. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , year =

  35. [35]

    Knowledge Decoupling via Orthogonal Projection for Lifelong Editing of Large Language Models

    Xu, Haoyu and Lan, Pengxiang and Yang, Enneng and Guo, Guibing and Zhao, Jianzhe and Jiang, Linying and Wang, Xingwei. Knowledge Decoupling via Orthogonal Projection for Lifelong Editing of Large Language Models. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.646

  36. [36]

    Forty-second International Conference on Machine Learning , year=

    Reinforced Lifelong Editing for Language Models , author=. Forty-second International Conference on Machine Learning , year=

  37. [37]

    E asy E dit: An Easy-to-use Knowledge Editing Framework for Large Language Models

    Wang, Peng and Zhang, Ningyu and Tian, Bozhong and Xi, Zekun and Yao, Yunzhi and Xu, Ziwen and Wang, Mengru and Mao, Shengyu and Wang, Xiaohan and Cheng, Siyuan and Liu, Kangwei and Ni, Yuansheng and Zheng, Guozhou and Chen, Huajun. E asy E dit: An Easy-to-use Knowledge Editing Framework for Large Language Models. Proceedings of the 62nd Annual Meeting of...

  38. [38]

    Goodfellow and Jonathon Shlens and Christian Szegedy , editor =

    Ian J. Goodfellow and Jonathon Shlens and Christian Szegedy , editor =. Explaining and Harnessing Adversarial Examples , booktitle =. 2015 , url =

  39. [39]

    Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,

    Xinbei Ma and Tianjie Ju and Jiyang Qiu and Zhuosheng Zhang and Hai Zhao and Lifeng Liu and Yulong Wang , title =. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,. 2024 , url =. doi:10.18653/V1/2024.EMNLP-MAIN.906 , timestamp =

  40. [40]

    AnyEdit: Edit Any Knowledge Encoded in Language Models, February 2025

    Houcheng Jiang and Junfeng Fang and Ningyu Zhang and Guojun Ma and Mingyang Wan and Xiang Wang and Xiangnan He and Tat. AnyEdit: Edit Any Knowledge Encoded in Language Models , journal =. 2025 , url =. doi:10.48550/ARXIV.2502.05628 , eprinttype =. 2502.05628 , timestamp =

  41. [41]

    Neuron-Level Sequential Editing for Large Language Models , booktitle =

    Houcheng Jiang and Junfeng Fang and Tianyu Zhang and Baolong Bi and An Zhang and Ruipeng Wang and Tao Liang and Xiang Wang , editor =. Neuron-Level Sequential Editing for Large Language Models , booktitle =. 2025 , url =

  42. [42]

    2025 , booktitle=

    Rethinking Residual Distribution in Locate-then-Edit Model Editing , author=. 2025 , booktitle=

  43. [43]

    Proceedings of the

    Tianyu Zhang and Junfeng Fang and Houcheng Jiang and Baolong Bi and Xiang Wang and Xiangnan He , title =. Proceedings of the. 2025 , url =. doi:10.1145/3696410.3714835 , timestamp =

  44. [44]

    CoRR , volume =

    Ruipeng Wang and Junfeng Fang and Jiaqi Li and Hao Chen and Jie Shi and Kun Wang and Xiang Wang , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2503.08116 , eprinttype =. 2503.08116 , timestamp =

  45. [45]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

    UniEdit: A Unified Knowledge Editing Benchmark for Large Language Models , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

  46. [46]

    Thinkeval: Practical evaluation of knowledge leakage in llm editing using thought-based knowledge graphs.arXiv preprint arXiv:2506.01386,

    Manit Baser and Dinil Mon Divakaran and Mohan Gurusamy , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2506.01386 , eprinttype =. 2506.01386 , timestamp =

  47. [47]

    5th International Conference on Learning Representations,

    Nitish Shirish Keskar and Dheevatsa Mudigere and Jorge Nocedal and Mikhail Smelyanskiy and Ping Tak Peter Tang , title =. 5th International Conference on Learning Representations,. 2017 , url =

  48. [48]

    On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , booktitle =

    L. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , booktitle =. 2018 , timestamp =

  49. [49]

    Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing Their Input Gradients , booktitle =

    Andrew Slavin Ross and Finale Doshi. Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing Their Input Gradients , booktitle =. 2018 , url =. doi:10.1609/AAAI.V32I1.11504 , timestamp =

  50. [50]

    The Tenth International Conference on Learning Representations,

    Rahim Entezari and Hanie Sedghi and Olga Saukh and Behnam Neyshabur , title =. The Tenth International Conference on Learning Representations,. 2022 , url =

  51. [51]

    2025 , booktitle=

    Position: Editing Large Language Models Poses Serious Safety Risks , author=. 2025 , booktitle=

  52. [52]

    How to make llms forget: On reversing in-context knowledge edits , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

  53. [53]

    Has this fact been edited? detecting knowledge edits in language models , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

  54. [54]

    2026 , eprint=

    Are We Evaluating the Edit Locality of LLM Model Editing Properly? , author=. 2026 , eprint=