arxiv: 2604.23396 · v1 · submitted 2026-04-25 · 💻 cs.IR · cs.AI· cs.CL· cs.LG

Recognition: unknown

Lost in Decoding? Reproducing and Stress-Testing the Look-Ahead Prior in Generative Retrieval

Kidist Amde Mekonnen , Yongkang Li , Yubao Tang , Simon Lupart , Maarten de Rijke

Authors on Pith no claims yet

Pith reviewed 2026-05-08 07:24 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.CLcs.LG

keywords generative retrievallook-ahead priorplan driftreproductionquery variationdecodingrobustness

0 comments

The pith

PAG's planning signal in generative retrieval collapses under intent-preserving typos.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reproduces the Planning Ahead in Generative Retrieval method and verifies its effectiveness on MS MARCO and TREC-DL benchmarks using the provided checkpoint. It develops plan drift diagnostics to quantify how the planner's candidate pool and token weights respond to query variations. The stress tests show that intent-preserving typos often cause plan collapse, shifting the candidates so the look-ahead guidance adds little value and decoding reverts to unguided beam search. Additional tests with translated queries reveal robustness issues in cross-lingual settings, where translation helps most as a mitigation. Readers should care because this indicates that the practical utility of such planning methods depends on handling realistic query noise without losing their edge.

Core claim

Reproducing PAG at inference time with the authors' artifacts confirms the main effectiveness results and beam-size trade-offs. The introduced plan drift diagnostics reveal that PAG's planning signal is brittle under lexical surface-form variation, as intent-preserving typos trigger plan collapse by altering the planned candidate pool enough that the look-ahead bonus provides little useful guidance, effectively reverting decoding toward weaker unguided search. Cross-lingual evaluation with non-English mMARCO queries on an English index shows query translation offers the strongest recovery among no-reindexing strategies.

What carries the argument

The look-ahead prior from simultaneous decoding that guides sequential decoding, whose stability is measured by plan drift diagnostics tracking shifts in top-n candidates and token priorities.

If this is right

PAG improves retrieval only when the planning signal stays stable against query variations.
Typos and similar changes can remove the benefit of the look-ahead bonus.
Cross-lingual mismatches between queries and the index challenge the planning approach.
Query translation can recover some performance without needing to rebuild the index.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real-world deployment of generative retrieval with planning should account for query typos and variations through preprocessing.
The observed brittleness might contribute to performance gaps between controlled benchmarks and live user traffic.
The plan drift diagnostics offer a general tool for evaluating robustness in other autoregressive ranking or generation methods.

Load-bearing premise

The plan drift and robustness findings are not driven by differences between the released checkpoint and the original model or by specific choices in beam size and trie construction.

What would settle it

If the original unreleased checkpoint shows stable candidate pools and sustained look-ahead gains even on typo-modified queries, that would indicate the brittleness is not inherent to the method.

Figures

Figures reproduced from arXiv: 2604.23396 by Kidist Amde Mekonnen, Maarten de Rijke, Simon Lupart, Yongkang Li, Yubao Tang.

**Figure 1.** Figure 1: Compact PAG pipeline with probes. Simultaneous view at source ↗

**Figure 2.** Figure 2: Candidate-set stability (CandOverlap@100). Can view at source ↗

**Figure 3.** Figure 3: Planner-token stability (TokJaccard@100). TokJac view at source ↗

**Figure 4.** Figure 4: Robustness under query variations: PAG vs. dense and GR baselines. Mean relative drop rate (%) under five intent view at source ↗

**Figure 5.** Figure 5: RQ3: Cross-lingual query shift. Stage 2 effective view at source ↗

read the original abstract

Generative retrieval (GR) ranks documents by autoregressively generating document identifiers. Because many GR methods rely on trie-constrained beam search, they are vulnerable to early pruning of relevant prefixes under finite-beam decoding. Planning Ahead in Generative Retrieval (PAG) mitigates this failure mode by using simultaneous decoding to compute a document-level look-ahead prior that guides subsequent sequential decoding. We reproduce PAG at inference time and stress-test its decoding behavior. Using the authors' released checkpoint and identifier/trie artifacts under the reported decoding setup, we reproduce the main effectiveness results on MS MARCO Dev and TREC-DL 2019/2020, and corroborate the reported beam-size-latency trade-off in our hardware setting. Beyond reproduction, we introduce plan drift diagnostics that quantify how intent-preserving query variations alter the planner's top-n candidate set and highest-weight planner tokens, and how these changes affect guided decoding. We find that PAG's planning signal is brittle under lexical surface-form variation: intent-preserving typos can trigger plan collapse, where the planned candidate pool shifts enough that the look-ahead bonus provides little useful guidance, effectively reverting decoding toward weaker unguided search. We further evaluate fixed-index cross-lingual robustness using non-English mMARCO queries against an English index, and assess query-side mitigation strategies that require no re-indexing; query translation provides the strongest recovery in our setting. Overall, our results confirm PAG's reported effectiveness and the benefit of planning-guided decoding under the released inference setup, while showing that these gains depend on the stability of the planning signal under realistic query variation and query-document mismatch.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The reproduction matches the original numbers cleanly and the new plan-drift diagnostics give a concrete picture of when PAG's look-ahead signal stops helping.

read the letter

This paper reproduces PAG at inference time and adds diagnostics that show the planning signal can collapse under small query changes. The core numbers line up with the published results on MS MARCO Dev and TREC-DL 2019/2020 when they use the released checkpoint and artifacts. They also confirm the beam-size versus latency trade-off in their hardware setting. That part is straightforward and reliable for anyone who wants to try the method themselves.

Referee Report

1 major / 2 minor

Summary. The manuscript reproduces the PAG generative retrieval method at inference time using the authors' released checkpoint and artifacts, matching published results on MS MARCO Dev and TREC-DL 2019/2020. It introduces plan drift diagnostics to show that intent-preserving typos can cause the planned candidate pool to shift, reducing the utility of the look-ahead prior and reverting to unguided search. The paper also examines cross-lingual robustness with mMARCO queries and query-side mitigations like translation.

Significance. This reproduction and stress-testing study is significant for the generative retrieval field as it provides empirical evidence on the stability of planning signals under query variation. The use of released artifacts and matching numbers strengthens the reliability of the findings. The plan drift diagnostics offer a new diagnostic tool, and the brittleness finding, if robust, indicates that GR methods may require additional safeguards for practical deployment with noisy queries. The cross-lingual tests add to understanding of index-query mismatch.

major comments (1)

[Plan drift diagnostics section] The plan drift diagnostics (top-n candidate shifts and planner token changes under intent-preserving typos) are produced under one fixed beam size and the released checkpoint. Without ablations varying beam width, trie construction details, or comparisons to the original training run, the observed plan collapse could be amplified by these configuration choices rather than reflecting an intrinsic property of the planning signal. This is load-bearing for the central brittleness claim.

minor comments (2)

[Abstract] The abstract states that query translation provides the strongest recovery but does not report the quantitative delta in retrieval metrics; adding these numbers would clarify the practical impact.
[Reproduction results] The reproduction of the beam-size-latency trade-off would benefit from explicitly stating the hardware configuration used for the latency measurements.

Simulated Author's Rebuttal

1 responses · 2 unresolved

We thank the referee for their constructive feedback on our reproduction and stress-testing of PAG. We respond to the major comment below.

read point-by-point responses

Referee: [Plan drift diagnostics section] The plan drift diagnostics (top-n candidate shifts and planner token changes under intent-preserving typos) are produced under one fixed beam size and the released checkpoint. Without ablations varying beam width, trie construction details, or comparisons to the original training run, the observed plan collapse could be amplified by these configuration choices rather than reflecting an intrinsic property of the planning signal. This is load-bearing for the central brittleness claim.

Authors: We thank the referee for this observation. Our study reproduces PAG inference using the authors' released checkpoint and artifacts under the reported decoding setup, as stated in the manuscript. The plan drift diagnostics are performed in this fixed configuration to assess the practical stability of the look-ahead prior when the method is used as publicly released. We agree that the observed collapse could be influenced by the specific beam size or trie details, and that ablations on these factors or comparisons to the original training run would offer additional context. However, such experiments require access to unreleased training code and full training artifacts, which are not available. Our central claim concerns the brittleness of the planning signal under realistic query variation in the released system, which we demonstrate empirically. In a partial revision we will add explicit statements clarifying that results hold for the fixed released configuration and note the potential sensitivity to beam width and trie construction as a limitation and avenue for future work on more robust planning. revision: partial

standing simulated objections not resolved

Ablations varying beam width and trie construction details
Comparisons to the original training run

Circularity Check

0 steps flagged

No significant circularity: purely empirical reproduction and stress-test study

full rationale

The paper performs reproduction of PAG effectiveness results and introduces plan drift diagnostics via experiments on MS MARCO and TREC-DL benchmarks using released checkpoints, identifier artifacts, and fixed decoding setups. No mathematical derivations, equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text or abstract. All claims rest on external benchmark runs and query variation tests rather than reducing to the paper's own inputs by construction. This is the expected outcome for an empirical reproduction study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical reproduction and robustness study with no new theoretical derivations. It relies on standard assumptions about the fidelity of released model checkpoints and benchmark datasets but introduces no free parameters, domain axioms, or invented entities.

pith-pipeline@v0.9.0 · 5616 in / 1222 out tokens · 47350 ms · 2026-05-08T07:24:32.653355+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 30 canonical work pages · 1 internal anchor

[1]

Michele Bevilacqua, Marco Maru, and Fabio Petroni. 2022. Autoregressive Search Engines: Generating Substrings as Document Identifiers. InAdvances in Neural Information Processing Systems (NeurIPS)

2022
[2]

arXiv preprint arXiv:2108.13897 , year=

Luiz Henrique Bonifacio, Israel Campiotti, R.A. Lotufo, and Rodrigo Frassetto Nogueira. 2021. mMARCO: A Multilingual Version of MS MARCO Passage Ranking Dataset.ArXivabs/2108.13897 (2021). https://api.semanticscholar.org/ CorpusID:274281707

work page arXiv 2021
[3]

Steven Dong, Yubao Tang, and Maarten de Rijke. 2026. Multi-Step Semantic Rea- soning in Generative Retrieval. InEuropean Conference on Information Retrieval. Springer, 273–281

2026
[4]

Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Sid- dharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaud- hary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, and Armand Joulin. 2021. Beyond English-centric Multi- lingual Machine Translation.J. Mach. Learn. Res.22, 1, ...

2021
[5]

Tim Hagen, Harrisen Scells, and Martin Potthast. 2024. Revisiting Query Variation Robustness of Transformer Models. InFindings of the Association for Computa- tional Linguistics: EMNLP 2024, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, Miami, Florida, USA, 4283–4296. https://doi.org/10.18653/v1/2024....

work page doi:10.18653/v1/2024.findings-emnlp.248 2024
[6]

Yuxin Huang, Simeng Wu, Ran Song, Yan Xiang, Yantuan Xian, Shengxiang Gao, and Zhengtao Yu. 2025. Multilingual Generative Retrieval via Cross-lingual Se- mantic Compression. InFindings of the Association for Computational Linguistics: EMNLP 2025, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computa...

work page doi:10.18653/v1/2025.findings-emnlp.575 2025
[7]

Jie Jiang, Yangru Huang, Zeyu Wang, Changping Wang, Yuling Xiong, Jun Zhang, and Huan Yu. 2026. Spend Search Where It Pays: Value-Guided Structured Sampling and Optimization for Generative Recommendation.arXiv preprint arXiv:2602.10699(2026)

work page arXiv 2026
[8]

Jian Jiao, Gong Yeyun, Nan Duan, Ruofei Zhang, and Ming Zhou. 2025. Look Ahead Strategy for Trie-based Beam Search in Generative Retrieval. US Patent 12,353,454

2025
[9]

Saar Kuzi, Mingyang Zhang, Cheng Li, Michael Bendersky, and Marc Najork. 2020. Leveraging Semantic and Lexical Matching to Improve the Recall of Document Retrieval Systems: A Hybrid Approach.arXiv preprint arXiv:2010.01195(2020)

work page arXiv 2020
[10]

Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yongkang Wu, Zhonghua Li, Ye Qi, and Zhicheng Dou. 2025. RetroLLM: Empowering Large Language Models to Retrieve Fine- grained Evidence within Generation. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and M...

work page doi:10.18653/v1/2025.acl-long.819 2025
[11]

Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yuyao Zhang, Peitian Zhang, Yutao Zhu, and Zhicheng Dou. 2025. From Matching to Generation: A Survey on Generative Information Retrieval.ACM Trans. Inf. Syst.43, 3, Article 83 (May 2025), 62 pages. https://doi.org/10.1145/3722552

work page doi:10.1145/3722552 2025
[12]

Yongkang Li. 2026. Understanding and Enhancing Robustness in Dense Informa- tion Retrieval. InAdvances in Information Retrieval - 48th European Conference on Information Retrieval, ECIR 2026, Delft, The Netherlands, March 29 - April 2, 2026, Proceedings, Part III (Lecture Notes in Computer Science). Springer, 599–607. https://doi.org/10.1007/978-3-032-21324-2_51

work page doi:10.1007/978-3-032-21324-2_51 2026
[13]

Yongkang Li, Panagiotis Eustratiadis, and Evangelos Kanoulas. 2025. Reproduc- ing HotFlip for Corpus Poisoning Attacks in Dense Retrieval. InAdvances in Information Retrieval - 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6-10, 2025, Proceedings, Part IV (Lecture Notes in Computer Science). Springer, 95–111. https://do...

work page doi:10.1007/978-3-031-88717-8_8 2025
[14]

Yongkang Li, Panagiotis Eustratiadis, Simon Lupart, and Evangelos Kanoulas
[15]

InProceedings of the 48th International ACM SIGIR Conference on Re- search and Development in Information Retrieval(Padua, Italy)(SIGIR ’25)

Unsupervised Corpus Poisoning Attacks in Continuous Space for Dense Retrieval. InProceedings of the 48th International ACM SIGIR Conference on Re- search and Development in Information Retrieval(Padua, Italy)(SIGIR ’25). As- sociation for Computing Machinery, New York, NY, USA, 2452–2462. https: //doi.org/10.1145/3726302.3730110

work page doi:10.1145/3726302.3730110
[16]

Yongqi Li, Nan Yang, Liang Wang, Furu Wei, and Wenjie Li. 2024. Learning to Rank in Generative Retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 8716–8723

2024
[17]

Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, and Xueqi Cheng. 2025. Robust Neural Information Retrieval: An Adversarial and Out- of-Distribution Perspective.ACM Trans. Inf. Syst.44, 1, Article 17 (Nov. 2025), 48 pages. https://doi.org/10.1145/3768153

work page doi:10.1145/3768153 2025
[18]

Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Changjiang Zhou, Maarten de Rijke, and Xueqi Cheng. 2025. On the Robustness of Generative Information Retrieval Models: An Out-of-Distribution Perspective. InAdvances in Information Retrieval: 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6–10, 2025, Proceedings, Part II(Lucca, Ital...

work page doi:10.1007/978-3-031-88711-6_26 2025
[19]

Smith, and Yejin Choi

Ximing Lu, Sean Welleck, Peter West, Liwei Jiang, Jungo Kasai, Daniel Khashabi, Ronan Le Bras, Lianhui Qin, Youngjae Yu, Rowan Zellers, Noah A. Smith, and Yejin Choi. 2022. NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics. InProceedings of the 2022 Conference of the North Ameri- can Chapter of the Association for Computat...

work page doi:10.18653/v1/2022.naacl-main.57 2022
[20]

Simon Lupart and Stéphane Clinchant. 2023. A Study on FGSM Adversarial Training for Neural Retrieval. InAdvances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part II(Dublin, Ireland). Springer-Verlag, Berlin, Heidelberg, 484–492. https://doi.org/10.1007/978-3-031-28238-6_39

work page doi:10.1007/978-3-031-28238-6_39 2023
[21]

Kidist Amde Mekonnen, Yubao Tang, and Maarten de Rijke. 2025. Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval(Padua, Italy)(SIGIR ’25). Association for Computing Machinery, New York, NY, USA, 1327–1...

work page arXiv 2025
[22]

Suraj Nair, Eugene Yang, Dawn Lawrie, Kevin Duh, Paul McNamee, Kenton Mur- ray, James Mayfield, and Douglas W. Oard. 2022. Transfer Learning Approaches for Building Cross-Language Dense Retrieval Models. InAdvances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part I(Stavan...

work page doi:10.1007/978-3-030-99736-6_26 2022
[23]

Nishanth Sridhar Nakshatri, Shamik Roy, Rajarshi Das, Suthee Chaidaroon, Leonid Boytsov, and Rashmi Gangadharaiah. 2025. Constrained Decoding with Speculative Lookaheads. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Luis Ch...

work page doi:10.18653/v1/2025.naacl-long.239 2025
[24]

Zach Nussbaum and Brandon Duderstadt. 2025. Training Sparse Mixture of Experts Text Embedding Models.arXiv preprint arXiv:2502.07972(2025)

work page arXiv 2025
[25]

Gustavo Penha, Arthur Câmara, and Claudia Hauff. 2022. Evaluating the Ro- bustness of Retrieval Pipelines with Query Variation Generators. InAdvances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part I(Stavanger, Norway). Springer-Verlag, Berlin, Heidelberg, 397–412. https...

work page doi:10.1007/978-3-030- 2022
[26]

Ronak Pradeep, Kai Hui, Jai Gupta, Adam Lelkes, Honglei Zhuang, Jimmy Lin, Donald Metzler, and Vinh Tran. 2023. How Does Generative Retrieval Scale to Millions of Passages?. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, ...

work page doi:10.18653/v1/2023.emnlp-main.83 2023
[27]

Weizhen Qi, Yeyun Gong, Yu Yan, Jian Jiao, Bo Shao, Ruofei Zhang, Houqiang Li, Nan Duan, and Ming Zhou. 2020. ProphetNet-Ads: A Looking Ahead Strategy for Generative Retrieval Models in Sponsored Search Engine. InCCF International Conference on Natural Language Processing and Chinese Computing. Springer, 305–317

2020
[28]

Felix Stahlberg and Bill Byrne. 2019. On NMT Search Errors and Model Errors: Cat Got Your Tongue?. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for C...

work page doi:10.18653/v1/d19-1331 2019
[29]

Yubao Tang, Ruqing Zhang, Jiafeng Guo, and Maarten de Rijke. 2023. Recent Advances in Generative Information Retrieval. InProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region. 294–297

2023
[30]

Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten De Rijke, Wei Chen, and Xueqi Cheng. 2024. Listwise generative retrieval models via a sequential learning process.ACM Transactions on Information Systems42, 5 (2024), 1–31

2024
[31]

Yubao Tang, Ruqing Zhang, Zhaochun Ren, Jiafeng Guo, and Maarten de Rijke
[32]

InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Infor- mation Retrieval(Washington DC, USA)(SIGIR ’24)

Recent Advances in Generative Information Retrieval. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Infor- mation Retrieval(Washington DC, USA)(SIGIR ’24). Association for Computing Machinery, New York, NY, USA, 3005–3008. https://doi.org/10.1145/3626772. 3661379

work page doi:10.1145/3626772
[33]

Yubao Tang, Ruqing Zhang, Weiwei Sun, Jiafeng Guo, and Maarten De Rijke. 2024. Recent Advances in Generative Information Retrieval. InCompanion Proceedings of the ACM Web Conference 2024. 1238–1241

2024
[34]

Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W

Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, and Donald Metzler. 2022. Transformer Memory as a Differentiable Search Index. In Advances in Neural Information Processing Systems (NeurIPS). arXiv:2202.06991. SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Aus...

work page arXiv 2022
[35]

Lifu Tu, Semih Yavuz, Jin Qu, Jiacheng Xu, Rui Meng, Caiming Xiong, and Yingbo Zhou. 2024. Unlocking Anticipatory Text Generation: A Constrained Approach for Large Language Models Decoding. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Com...

work page doi:10.18653/v1/2024.emnlp- 2024
[36]

Henrique Schechter Vera, Sahil Dua, Biao Zhang, Daniel Salz, Ryan Mullins, Sindhu Raghuram Panyam, Sara Smoot, Iftekhar Naim, Joe Zou, Feiyang Chen, et al. 2025. EmbeddingGemma: Powerful and Lightweight Text Representations. arXiv preprint arXiv:2509.20354(2025)

work page arXiv 2025
[37]

Yujing Wang, Yingyan Hou, Haonan Wang, Ziming Miao, Shibin Wu, Hao Sun, Qi Chen, Yuqing Xia, Chengmin Chi, Guoshuai Zhao, Zheng Liu, Xing Xie, Hao Allen Sun, Weiwei Deng, Qi Zhang, and Mao Yang. 2022. A Neural Corpus Indexer for Document Retrieval. InProceedings of the 36th International Conference on Neural Information Processing Systems(New Orleans, LA,...

2022
[38]

Shiguang Wu, Zhaochun Ren, Xin Xin, Jiyuan Yang, Mengqi Zhang, Zhumin Chen, Maarten de Rijke, and Pengjie Ren. 2025. Constrained Auto-Regressive Decoding Constrains Generative Retrieval. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval(Padua, Italy)(SIGIR ’25). Association for Computing Mach...

work page doi:10.1145/3726302.3729934 2025
[39]

Hansi Zeng, Chen Luo, and Hamed Zamani. 2024. Planning Ahead in Gen- erative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Washington DC, USA)(SI- GIR ’24). Association for Computing Machinery, New York, NY, USA, ...

work page doi:10.1145/3626772.3657746 2024
[40]

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. 2025. Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models.arXiv preprint arXiv:2506.05176(2025)

work page internal anchor Pith review arXiv 2025
[41]

Yujia Zhou, Zhicheng Dou, and Ji-Rong Wen. 2023. Enhancing Generative Retrieval with Reinforcement Learning from Relevance Feedback. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 12481–12490

2023
[42]

Yujia Zhou, Jing Yao, Zhicheng Dou, Yiteng Tu, Ledell Wu, Tat-Seng Chua, and Ji-Rong Wen. 2024. ROGER: Ranking-Oriented Generative Retrieval.ACM Trans. Inf. Syst.42, 6, Article 155 (Oct. 2024), 25 pages. https://doi.org/10.1145/3603167

work page doi:10.1145/3603167 2024
[43]

Yujia Zhou, Jing Yao, Zhicheng Dou, Ledell Wu, Peitian Zhang, and Ji-Rong Wen
[44]

Ultron: An Ultimate Retriever on Corpus with a Model-based Indexer.arXiv preprint arXiv:2208.09257(2022)

work page arXiv 2022