Recognition: no theorem link
Model Editing for New Document Integration in Generative Information Retrieval
Pith reviewed 2026-05-15 17:28 UTC · model grok-4.3
The pith
DOME edits decoder mappings in generative retrieval models to handle new documents using hybrid soft-hard labels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The decoder's mapping from hidden states to docIDs of new documents forms the central bottleneck; DOME resolves it by three stages of layer selection, hybrid-label optimization of edit vectors, and update construction, yielding measurable retrieval gains on added documents while preserving the original collection and using far less training compute than incremental fine-tuning.
What carries the argument
Hybrid-label adaptive training that mixes soft labels preserving query-specific semantics with hard labels enforcing precise docID changes to generate distinguishable edit vectors for targeted decoder updates.
If this is right
- Retrieval effectiveness holds steady on the original document collection while rising on newly added documents.
- Training cost drops to about 60 percent of the cost of incremental training.
- Frequent model updates become practical because each edit targets only the decoder mapping rather than retraining the full model.
- The same editing pipeline can be repeated whenever more documents arrive without restarting from scratch.
Where Pith is reading between the lines
- If the hybrid labels succeed in separating vectors, the same editing pattern could extend to other generative tasks whose output vocabularies grow over time.
- The method may offer a route to continual learning in retrieval systems where documents arrive in streams rather than batches.
- Testing on collections that change daily or weekly would reveal how many successive edits the model can absorb before accuracy on old documents begins to slip.
Load-bearing premise
The decoder's mapping from hidden states to new docIDs is the main failure point, and hybrid-label training can produce edit vectors distinct enough to fix new mappings without damaging the existing ones.
What would settle it
A measurement showing that edit vectors for different queries remain too similar after hybrid training, or that applying the updates measurably harms accuracy on the original document set, would falsify the claim.
Figures
read the original abstract
Generative retrieval (GR) reformulates the Information Retrieval (IR) task as the generation of document identifiers (docIDs). Despite its promise, existing GR models exhibit poor generalization to newly added documents, often failing to generate the correct docIDs. While incremental training offers a straightforward remedy, it is computationally expensive, resource-intensive, and prone to catastrophic forgetting, thereby limiting the scalability and practicality of GR. In this paper, we identify the core bottleneck as the decoder's ability to map hidden states to the correct docIDs of newly added documents. Model editing, which enables targeted parameter modifications for docID mapping, represents a promising solution. However, applying model editing to current GR models is not trivial, which is severely hindered by indistinguishable edit vectors across queries, due to the high overlap of shared docIDs in retrieval results. To address this, we propose DOME (docID-oriented model editing), a novel method that effectively and efficiently adapts GR models to unseen documents. DOME comprises three stages: (1) identification of critical layers, (2) optimization of edit vectors, and (3) construction and application of updates. At its core, DOME employs a hybrid-label adaptive training strategy that learns discriminative edit vectors by combining soft labels, which preserve query-specific semantics for distinguishable updates, with hard labels that enforce precise mapping modifications. Experiments on widely used benchmarks, including NQ and MS MARCO, show that our method significantly improves retrieval performance on new documents while maintaining effectiveness on the original collection. Moreover, DOME achieves this with only about 60% of the training time required by incremental training, considerably reducing computational cost and enabling efficient, frequent model updates.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes DOME, a three-stage docID-oriented model editing method to adapt generative retrieval models to newly added documents. It identifies the decoder mapping as the core bottleneck, uses critical-layer identification followed by optimization of edit vectors via a hybrid soft+hard label strategy (soft labels to preserve query-specific semantics for distinguishability, hard labels for precise docID remapping), and applies the resulting updates. The central empirical claim is that DOME yields significant gains on new-document retrieval for NQ and MS MARCO while preserving effectiveness on the original collection, at roughly 60% of the training cost of incremental training.
Significance. If the central performance and efficiency claims hold under rigorous controls, the work would be a meaningful contribution to generative IR by offering a targeted, low-cost alternative to full retraining. It directly addresses the practical barrier of frequent document additions without catastrophic forgetting. The hybrid-label mechanism is a plausible adaptation of model-editing ideas to the high docID-overlap setting typical of retrieval, but its added value over simpler editing baselines remains to be isolated.
major comments (3)
- [Experiments] Experiments section: the reported gains on NQ and MS MARCO lack any description of the exact baselines (e.g., which incremental-training variants or prior editing methods), statistical significance tests, or ablation of the hybrid-label component. Without these, the claim that DOME “significantly improves” performance while preserving original-collection effectiveness is only partially supported.
- [Method (stage 2) and Analysis] Method (stage 2) and Analysis: the manuscript asserts that hybrid soft+hard labels produce “distinguishable edit vectors” that avoid side effects on existing docID mappings, yet provides no direct verification such as pairwise cosine similarity of edit vectors across queries or an ablation removing the soft-label term. The observed stability on old documents could therefore be explained by conservative update magnitudes rather than the claimed discriminative property of the hybrid strategy.
- [Experiments] Experiments: no controls or measurements for catastrophic forgetting are described beyond the high-level statement that original-collection effectiveness is maintained. A quantitative comparison of per-docID generation accuracy before and after editing on the original collection would be required to substantiate the “no forgetting” claim.
minor comments (2)
- [Abstract] Abstract: the phrase “significantly improves retrieval performance” is used without any numerical deltas, confidence intervals, or reference to the tables that contain the results.
- [Method] Notation: the distinction between “edit vector” and “update” is used interchangeably in several places; a single consistent definition would improve clarity.
Simulated Author's Rebuttal
We appreciate the referee's detailed feedback on our manuscript. The comments highlight areas where additional details and analyses can strengthen our claims regarding DOME's performance and the effectiveness of the hybrid-label strategy. We address each point below and commit to incorporating the suggested improvements in the revised version.
read point-by-point responses
-
Referee: [Experiments] Experiments section: the reported gains on NQ and MS MARCO lack any description of the exact baselines (e.g., which incremental-training variants or prior editing methods), statistical significance tests, or ablation of the hybrid-label component. Without these, the claim that DOME “significantly improves” performance while preserving original-collection effectiveness is only partially supported.
Authors: We agree that the experiments section would benefit from more precise descriptions of the baselines used, including specific incremental-training variants and prior editing methods. In the revised manuscript, we will expand this section to detail the exact baselines, report statistical significance tests for the observed gains, and include an ablation study isolating the hybrid-label component. These additions will provide stronger support for our performance claims. revision: yes
-
Referee: [Method (stage 2) and Analysis] Method (stage 2) and Analysis: the manuscript asserts that hybrid soft+hard labels produce “distinguishable edit vectors” that avoid side effects on existing docID mappings, yet provides no direct verification such as pairwise cosine similarity of edit vectors across queries or an ablation removing the soft-label term. The observed stability on old documents could therefore be explained by conservative update magnitudes rather than the claimed discriminative property of the hybrid strategy.
Authors: We acknowledge that direct verification of the distinguishability of edit vectors is missing. To address this, we will include pairwise cosine similarity measurements of edit vectors across different queries in the analysis section. Additionally, we will perform and report an ablation study that removes the soft-label term to demonstrate its role in producing discriminative updates, separate from any effects of update magnitude. revision: yes
-
Referee: [Experiments] Experiments: no controls or measurements for catastrophic forgetting are described beyond the high-level statement that original-collection effectiveness is maintained. A quantitative comparison of per-docID generation accuracy before and after editing on the original collection would be required to substantiate the “no forgetting” claim.
Authors: We agree that a more rigorous quantification of catastrophic forgetting is necessary. In the revised experiments, we will add a quantitative comparison of per-docID generation accuracy on the original collection before and after applying DOME. This will provide concrete evidence that the editing process does not degrade performance on existing documents. revision: yes
Circularity Check
No circularity in derivation chain; method and claims are empirically grounded
full rationale
The paper introduces DOME as a three-stage model-editing procedure (critical-layer identification, hybrid-label edit-vector optimization, update construction) to adapt generative retrieval models to new documents. No equations, derivations, or self-referential definitions appear that reduce the claimed improvements (e.g., better new-document retrieval with preserved original-collection performance) to fitted parameters defined by the method itself or to self-citation chains. The hybrid-label strategy is presented as a design choice motivated by observed indistinguishability of edit vectors, not as a tautological fit. Experiments on NQ and MS MARCO are reported as external validation rather than predictions forced by the inputs. This is a standard empirical method paper with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
free parameters (1)
- edit vector magnitude and learning rate
axioms (2)
- domain assumption Critical layers can be identified reliably from activation statistics or gradient signals without exhaustive search.
- ad hoc to paper Hybrid soft-hard labels produce edit vectors that remain distinguishable across queries sharing many docIDs.
Forward citations
Cited by 1 Pith paper
-
GenRecEdit: Adapting Model Editing for Generative Recommendation with Cold-Start Items
GenRecEdit injects cold-start items into generative recommendation models via context-aware token editing and interference-reducing triggers, boosting cold-start accuracy while using only 9.5% of retraining time.
Reference graph
Works this paper leans on
-
[1]
Mohiuddin Ahmed, Raihan Seraj, and Syed Mohammed Shamsul Islam. 2020. The k-means algorithm: A comprehensive survey and performance evaluation. Electronics9, 8 (2020), 1295
work page 2020
-
[2]
Michele Bevilacqua, Giuseppe Ottaviano, Patrick Lewis, Scott Yih, Sebastian Riedel, and Fabio Petroni. 2022. Autoregressive Search Engines: Generating Substrings as Document Identifiers. InNeurIPS
work page 2022
-
[3]
Albrecht Böttcher and David Wenzel. 2008. The Frobenius norm and the commu- tator.Linear algebra and its applications429, 8-9 (2008), 1864–1885
work page 2008
-
[4]
Nicola De Cao, Wilker Aziz, and Ivan Titov. 2021. Editing Factual Knowledge in Language Models. InEMNLP. ACL, 6491–6506
work page 2021
-
[5]
Nicola De Cao, Gautier Izacard, Sebastian Riedel, and Fabio Petroni. 2021. Au- toregressive Entity Retrieval. InICLR. OpenReview.net
work page 2021
-
[6]
Siyuan Cheng, Ningyu Zhang, Bozhong Tian, Xi Chen, Qingbin Liu, and Huajun Chen. 2024. Editing language model-based knowledge graph embeddings. In AAAI, Vol. 38. 17835–17843
work page 2024
-
[7]
Damai Dai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, and Furu Wei. 2022. Knowledge Neurons in Pretrained Transformers. InACL. ACL, 8493–8502
work page 2022
-
[8]
Qingxiu Dong, Damai Dai, Yifan Song, Jingjing Xu, Zhifang Sui, and Lei Li. 2022. Calibrating Factual Knowledge in Pretrained Language Models. InEMNLP. ACL, 5937–5947
work page 2022
-
[9]
Junfeng Fang, Houcheng Jiang, Kun Wang, Yunshan Ma, Jie Shi, Xiang Wang, Xiangnan He, and Tat-Seng Chua. 2025. AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models. InICLR. OpenReview.net
work page 2025
-
[10]
Mor Geva, Roei Schuster, Jonathan Berant, and Omer Levy. 2021. Transformer Feed-Forward Layers Are Key-Value Memories. InEMNLP. ACL, 5484–5495
work page 2021
-
[11]
Jiafeng Guo, Changjiang Zhou, Ruqing Zhang, Jiangui Chen, Maarten de Rijke, Yixing Fan, and Xueqi Cheng. 2024. Corpusbrain++: A continual generative pre- training framework for knowledge-intensive language tasks.ACM Transactions on Information Systems(2024)
work page 2024
-
[12]
Yongquan He, Zihan Wang, Peng Zhang, Zhaopeng Tu, and Zhaochun Ren. 2020. VN Network: Embedding Newly Emerging Entities with Virtual Neighbors. In CIKM. 505–514
work page 2020
- [13]
-
[14]
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open- Domain Question Answering. InEMNLP. ACL, 6769–6781
work page 2020
-
[15]
Aditi Khandelwal, Harman Singh, Hengrui Gu, Tianlong Chen, and Kaixiong Zhou. 2024. Cross-Lingual Multi-Hop Knowledge Editing. InEMNLP. ACL, 11995–12015
work page 2024
-
[16]
Chaeeun Kim, Soyoung Yoon, Hyunji Lee, Joel Jang, Sohee Yang, and Minjoon Seo. 2024. Exploring the Practicality of Generative Retrieval on Dynamic Corpora. InEMNLP. ACL, 13616–13633
work page 2024
-
[17]
Varsha Kishore, Chao Wan, Justin Lovelace, Yoav Artzi, and Kilian Q. Weinberger
-
[18]
IncDSI: Incrementally Updatable Document Retrieval. InICML, Vol. 202. PMLR, 17122–17134
-
[19]
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, et al. 2019. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics7 (2019), 452–466
work page 2019
-
[20]
Sangam Lee, Ryang Heo, SeongKu Kang, Susik Yoon, Jinyoung Yeo, and Dongha Lee. 2024. Why These Documents? Explainable Generative Retrieval with Hier- archical Category Paths.arXiv preprint arXiv:2411.05572(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[21]
Yongqi Li, Nan Yang, Liang Wang, Furu Wei, and Wenjie Li. 2023. Multiview Identifiers Enhanced Generative Retrieval. InACL. ACL, 6636–6648
work page 2023
-
[22]
Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Changjiang Zhou, Maarten de Rijke, and Xueqi Cheng. 2025. On the Robustness of Generative Information Retrieval Models: An Out-of-Distribution Perspective. InECIR, Vol. 15573. Springer, 407– 423
work page 2025
-
[23]
Yougang Lyu, Lingyong Yan, Shuaiqiang Wang, Haibo Shi, Dawei Yin, Pengjie Ren, Zhumin Chen, Maarten de Rijke, and Zhaochun Ren. 2024. KnowTuning: Knowledge-aware Fine-tuning for Large Language Models. InEMNLP. 14535– 14556
work page 2024
-
[24]
Yougang Lyu, Lingyong Yan, Zihan Wang, Dawei Yin, Pengjie Ren, Maarten de Rijke, and Zhaochun Ren. 2025. MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization. InICLR
work page 2025
-
[25]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research9, Nov (2008), 2579–2605
work page 2008
-
[26]
Vittorio Mazzia, Alessandro Pedrani, Andrea Caciolai, Kay Rottmann, and Da- vide Bernardi. 2024. A survey on knowledge editing of neural networks.IEEE Transactions on Neural Networks and Learning Systems(2024)
work page 2024
-
[27]
Tran, Jinfeng Rao, Marc Najork, Emma Strubell, and Donald Metzler
Sanket Vaibhav Mehta, Jai Gupta, Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Jinfeng Rao, Marc Najork, Emma Strubell, and Donald Metzler. 2023. DSI++: Updating Transformer Memory with New Documents. InEMNLP. ACL, 8198–8213
work page 2023
-
[28]
Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. 2022. Locating and Editing Factual Associations in GPT. InNeurIPS
work page 2022
-
[29]
Andonian, Yonatan Belinkov, and David Bau
Kevin Meng, Arnab Sen Sharma, Alex J. Andonian, Yonatan Belinkov, and David Bau. 2023. Mass-Editing Memory in a Transformer. InICLR. OpenReview.net
work page 2023
-
[30]
Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D Manning, and Chelsea Finn. 2022. Memory-based model editing at scale. InICML. PMLR, 15817–15831
work page 2022
-
[31]
Rafael Müller, Simon Kornblith, and Geoffrey E. Hinton. 2019. When does label smoothing help?. InNeurIPS. 4696–4705
work page 2019
-
[32]
Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. InCOCO@NeurIPS, Vol. 1773. CEUR-WS.org
work page 2016
-
[33]
Rodrigo Nogueira, Jimmy Lin, and AI Epistemic. 2019. From doc2query to docTTTTTquery.Online preprint6, 2 (2019)
work page 2019
-
[34]
Anja Reusch and Yonatan Belinkov. 2025. Reverse-Engineering the Retrieval Process in GenIR Models. InSIGIR. ACM, 668–677
work page 2025
-
[35]
Stephen Robertson, Hugo Zaragoza, et al . 2009. The probabilistic relevance framework: BM25 and beyond.Foundations and Trends in Information Retrieval 3, 4 (2009), 333–389
work page 2009
-
[36]
Shibani Santurkar, Dimitris Tsipras, Mahalaxmi Elango, David Bau, Antonio Torralba, and Aleksander Madry. 2021. Editing a classifier by rewriting its prediction rules. InNeurIPS. 23359–23373
work page 2021
-
[37]
Weiwei Sun, Lingyong Yan, Zheng Chen, Shuaiqiang Wang, Haichao Zhu, Pengjie Ren, Zhumin Chen, Dawei Yin, Maarten de Rijke, and Zhaochun Ren. 2023. Learning to Tokenize for Generative Retrieval. InNeurIPS
work page 2023
-
[38]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. InNeurIPS. 3104–3112
work page 2014
-
[39]
Yi Tay, Vinh Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Prakash Gupta, Tal Schuster, William W. Cohen, and Donald Metzler. 2022. Transformer Memory as a Differentiable Search Index. In NeurIPS
work page 2022
-
[40]
Gomez, Lukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. InNeurIPS. 5998–6008
work page 2017
-
[41]
Yujing Wang, Yingyan Hou, Haonan Wang, Ziming Miao, Shibin Wu, Qi Chen, Yuqing Xia, Chengmin Chi, Guoshuai Zhao, Zheng Liu, Xing Xie, Hao Sun, Weiwei Deng, Qi Zhang, and Mao Yang. 2022. A Neural Corpus Indexer for Document Retrieval. InNeurIPS
work page 2022
- [42]
-
[43]
Zihan Wang, Zhaochun Ren, Chunyu He, Peng Zhang, and Yue Hu. 2019. Robust Embedding with Multi-Level Structures for Link Prediction. InIJCAI. 5240–5246
work page 2019
-
[44]
Zihan Wang, Kai Zhao, Yongquan He, Zhumin Chen, Pengjie Ren, Maarten de Rijke, and Zhaochun Ren. 2023. Iteratively Learning Representations for Unseen Entities with Inter-Rule Correlations. InCIKM. 2534–2543
work page 2023
-
[45]
Zihan Wang, Ziqi Zhao, Zhumin Chen, Pengjie Ren, Maarten de Rijke, and Zhaochun Ren. 2023. Generalizing Few-Shot Named Entity Recognizers to Unseen Domains with Type-Related Features. InEMNLP. 2228–2240
work page 2023
-
[46]
Zihan Wang, Yujia Zhou, Yiteng Tu, and Zhicheng Dou. 2023. NOVO: learnable and interpretable document identifiers for model-based IR. InCIKM. 2656–2665
work page 2023
-
[47]
Peipei Xia, Li Zhang, and Fanzhang Li. 2015. Learning similarity with cosine similarity ensemble.Information sciences307 (2015), 39–52
work page 2015
- [48]
-
[49]
Tianchi Yang, Minghui Song, Zihan Zhang, Haizhen Huang, Weiwei Deng, Feng Sun, and Qi Zhang. 2023. Auto Search Indexer for End-to-End Document Retrieval. InEMNLP. ACL, 6955–6970
work page 2023
-
[50]
Hansi Zeng, Chen Luo, Bowen Jin, Sheikh Muhammad Sarwar, Tianxin Wei, and Hamed Zamani. 2024. Scalable and Effective Generative Information Retrieval. InWWW. ACM, 1441–1452
work page 2024
-
[51]
Hansi Zeng, Chen Luo, and Hamed Zamani. 2024. Planning ahead in generative retrieval: Guiding autoregressive generation through simultaneous decoding. In SIGIR. ACM, 469–480
work page 2024
-
[52]
Zhen Zhang, Xinyu Ma, Weiwei Sun, Pengjie Ren, Zhumin Chen, Shuaiqiang Wang, Dawei Yin, Maarten de Rijke, and Zhaochun Ren. 2025. Replication and Exploration of Generative Retrieval over Dynamic Corpora. InSIGIR. ACM, 3325–3334
work page 2025
-
[53]
Ce Zheng, Lei Li, Qingxiu Dong, Yuxuan Fan, Zhiyong Wu, Jingjing Xu, and Baobao Chang. 2023. Can We Edit Factual Knowledge by In-Context Learning?. InEMNLP. ACL, 4862–4876
work page 2023
-
[54]
Yujia Zhou, Jing Yao, Zhicheng Dou, Ledell Wu, Peitian Zhang, and Ji-Rong Wen
- [55]
-
[56]
Yu-Jia Zhou, Jing Yao, Zhi-Cheng Dou, Ledell Wu, and Ji-Rong Wen. 2023. Dy- namicretriever: A pre-trained model-based ir system without an explicit index. Machine Intelligence Research20, 2 (2023), 276–288. WWW ’26, April 13–17, 2026, Dubai, United Arab Emirates Zhen Zhang et al. A Appendix A.1 Patching for locating critical layers Impact of average patch...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.