pith. sign in

arxiv: 2606.31693 · v1 · pith:LP6GK5MDnew · submitted 2026-06-30 · 💻 cs.IR · cs.AI· cs.CL

ShopX: A Foundation Model for Intent-to-Item Fulfillment in Agentic Shopping

Pith reviewed 2026-07-01 03:53 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.CL
keywords foundation modelagentic shoppingsemantic IDsintent fulfillmentLLM agentsitem-space operationsmulti-turn tasks
0
0 comments X

The pith

ShopX trains one foundation model to translate shopping intents directly into item-space actions using semantic IDs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that wrapping LLMs around separate search and ranking systems creates lossy hand-offs that hurt complex or ambiguous shopping requests. ShopX instead unifies intent parsing, planning, and execution inside a single model that operates natively on semantic IDs for retrieval, ranking, and bundling. A custom training recipe equips the base LLM to perform these multi-turn item-space tasks while keeping its general knowledge and instruction following intact. The resulting model-native framework is evaluated on real Taobao-derived single- and multi-turn tasks and shown to outperform tool-mediated agent baselines, especially on harder cases. This design removes the need for external retrieval interfaces between the agent and the item catalog.

Core claim

ShopX is a foundation model that combines intent understanding, execution planning, and flexible SID-native item-space operations inside one system, deployed through a model-facing action protocol and serving harness that supports context access, catalog grounding, and state management for agentic shopping workflows.

What carries the argument

Semantically recoverable, LLM-operable semantic IDs (SIDs) that let the model compose operations such as SID beam-search retrieval, listwise ranking, and product bundling directly in item space.

If this is right

  • Model-native fulfillment reduces lossy hand-offs between agent orchestration and item-space execution.
  • Performance gains appear most clearly on complex or ambiguous multi-turn requests.
  • The same model can retain general LLM capabilities while gaining specialized item-space skills.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same unification pattern could be tested in other agentic domains that currently route language to external tools.
  • If SIDs prove stable across catalogs, the approach might reduce dependence on separate indexing pipelines in production systems.

Load-bearing premise

Semantically recoverable SIDs can be designed and a training recipe exists that equips a general LLM for flexible multi-turn item-space fulfillment while retaining its original knowledge and instruction-following abilities.

What would settle it

A controlled comparison on the same Taobao-derived tasks where the ShopX model-native system shows no gain or worse performance than tool-mediated baselines on complex or ambiguous requests.

read the original abstract

The wave of AI-native applications is moving shopping beyond page- and feed-based browsing toward intent-driven experiences orchestrated by LLM agents. A common design wraps an LLM around existing search and recommendation pipelines, forcing complex intents through low-bandwidth retrieval or ranking interfaces and leaving a gap between language understanding and item-space fulfillment. Generative recommendation gives LLMs a direct item-space interface through semantic IDs (SIDs), but existing models mainly generate candidates for retrieval rather than translate flexible intents into item-space outcomes. We propose ShopX to address this bottleneck by unifying intent understanding, execution planning, and flexible SID-native item-space operations into a single foundation model. We deploy ShopX in agentic shopping workflows through a model-native item-fulfillment framework with a serving harness that defines a model-facing action protocol and exposes support surfaces for context access, catalog grounding, and state management. Within this framework, ShopX plans and composes SID-based item-space operations such as SID beam-search retrieval, listwise ranking, or product bundling. This model-centric design reduces lossy hand-offs between agent orchestration and item-space execution. To build ShopX, we design semantically recoverable, LLM-operable SIDs and a training recipe that equips a general LLM for flexible multi-turn item-space fulfillment while retaining the knowledge and instruction-following abilities needed by a shopping agent. We evaluate the ShopX framework against tool-mediated agentic systems on single- and multi-turn fulfillment tasks derived from anonymized Taobao production logs, showing that model-native fulfillment improves overall framework behavior, especially on complex or ambiguous requests.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes ShopX, a foundation model unifying intent understanding, execution planning, and flexible SID-native item-space operations (e.g., beam-search retrieval, listwise ranking, bundling) for agentic shopping. It introduces a model-native fulfillment framework with a serving harness for action protocols, context access, and state management. The authors design semantically recoverable SIDs and a training recipe claimed to equip a general LLM for multi-turn fulfillment while retaining knowledge and instruction-following. Evaluation on single- and multi-turn tasks from anonymized Taobao production logs is asserted to show that model-native fulfillment outperforms tool-mediated agentic systems, especially on complex or ambiguous requests.

Significance. If the training recipe and empirical results hold, the work could meaningfully advance agentic e-commerce by closing the gap between LLM reasoning and direct item-space manipulation, reducing lossy tool interfaces. The focus on preserving general LLM capabilities during domain adaptation is a positive framing that aligns with practical deployment needs.

major comments (2)
  1. [Abstract] Abstract: The central claim that a training recipe equips a general LLM for SID-based multi-turn operations (beam-search, ranking, bundling) while retaining instruction-following is unsupported by any description of SID construction, loss terms, data mixtures, or retention ablations; this detail is load-bearing for the weakest assumption identified in the stress-test.
  2. [Abstract] Abstract: The assertion that 'model-native fulfillment improves overall framework behavior, especially on complex or ambiguous requests' is presented without any metrics, baselines, error bars, task definitions, or statistical details from the Taobao log evaluation, preventing verification of the claimed superiority over tool-mediated systems.
minor comments (1)
  1. [Abstract] Abstract: The acronym SID is used without an initial expansion or reference to prior generative-recommendation literature, which reduces accessibility for readers outside that sub-area.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments on the abstract below, noting that the full manuscript provides the supporting technical details while the abstract serves as a concise summary. We propose targeted revisions to improve clarity.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that a training recipe equips a general LLM for SID-based multi-turn operations (beam-search, ranking, bundling) while retaining instruction-following is unsupported by any description of SID construction, loss terms, data mixtures, or retention ablations; this detail is load-bearing for the weakest assumption identified in the stress-test.

    Authors: The manuscript body contains these descriptions: SID construction and semantic recoverability are detailed in Section 3.1, the training recipe (including loss terms, data mixtures, and the multi-turn fulfillment protocol) appears in Section 4, and retention ablations for instruction-following and general capabilities are reported in Section 5.3. The abstract summarizes rather than replicates these sections. We will revise the abstract to include one additional sentence providing high-level pointers to these elements. revision: partial

  2. Referee: [Abstract] Abstract: The assertion that 'model-native fulfillment improves overall framework behavior, especially on complex or ambiguous requests' is presented without any metrics, baselines, error bars, task definitions, or statistical details from the Taobao log evaluation, preventing verification of the claimed superiority over tool-mediated systems.

    Authors: Section 5 defines the single- and multi-turn tasks derived from the Taobao logs, specifies the tool-mediated baselines, reports metrics with error bars, and includes statistical comparisons. The abstract condenses the outcome. We will revise the abstract to include a short quantitative summary (e.g., relative improvement ranges on complex queries) while remaining within length constraints. revision: partial

Circularity Check

0 steps flagged

No derivation chain or load-bearing equations present; claims rest on architectural description and external evaluation

full rationale

The provided abstract and manuscript description contain no equations, parameter fits, uniqueness theorems, or self-citations that reduce any prediction or result to the inputs by construction. The central claims concern the existence of a training recipe and SID design, supported by evaluation on Taobao production logs rather than internal self-reference. This is a standard self-contained systems paper with no circular steps matching the enumerated patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no information on free parameters, axioms, or invented entities can be extracted.

pith-pipeline@v0.9.1-grok · 5909 in / 1005 out tokens · 43858 ms · 2026-07-01T03:53:07.658758+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

74 extracted references · 34 canonical work pages · 14 internal anchors

  1. [1]

    Introducing gpt-5

    OpenAI. Introducing gpt-5. https://openai.com/index/introducing-gpt-5/ , August 2025. Accessed: 2026-06-08

  2. [2]

    Introducing claude 4

    Anthropic. Introducing claude 4. https://www.anthropic.com/news/claude-4 , May 2025. Accessed: 2026-06-08

  3. [3]

    Gemini 3: Introducing the latest gemini ai model from google

    Google. Gemini 3: Introducing the latest gemini ai model from google. https://blog.g oogle/products/gemini/gemini-3/, November 2025. Accessed: 2026-06-08

  4. [4]

    Qwen3 Technical Report

    Qwen Team. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

  5. [5]

    Introducing codex

    OpenAI. Introducing codex. https://openai.com/index/introducing-codex/ , May 2025. Accessed: 2026-06-08

  6. [6]

    Claude code

    Anthropic. Claude code. https://docs.anthropic.com/en/docs/claude-code/ge tting-started, 2025. Accessed: 2026-06-08

  7. [7]

    Openclaw.https://docs.openclaw.ai/, 2026

    OpenClaw. Openclaw.https://docs.openclaw.ai/, 2026. Accessed: 2026-06-08

  8. [8]

    Amazon’s rufus ai assistant now available to all u.s

    Amazon. Amazon’s rufus ai assistant now available to all u.s. customers. https://ww w.aboutamazon.com/news/retail/how-to-use-amazon-rufus , 2024. Accessed: 2026-05-11

  9. [9]

    Powering product discovery in chatgpt

    OpenAI. Powering product discovery in chatgpt. https://openai.com/index/power ing-product-discovery-in-chatgpt/, March 2026. Accessed: 2026-05-11

  10. [10]

    Buy it in chatgpt: Instant checkout and the agentic commerce protocol

    OpenAI. Buy it in chatgpt: Instant checkout and the agentic commerce protocol. https: //openai.com/index/buy-it-in-chatgpt/, September 2025. Accessed: 2026-05-11

  11. [11]

    千问与淘宝打通用ai也能“逛淘宝”了

    People’s Daily Online. 千问与淘宝打通用ai也能“逛淘宝”了. https://finance.people .com.cn/n1/2026/0511/c1004-40717594.html, May 2026. Accessed: 2026-06-01

  12. [12]

    Shop with ai mode, use ai to buy and try clothes on yourself virtually

    Google. Shop with ai mode, use ai to buy and try clothes on yourself virtually. https: //blog.google/products-and-platforms/products/shopping/google-shopp ing-ai-mode-virtual-try-on-update/, May 2025. Accessed: 2026-05-11

  13. [13]

    淘宝内测 ai搜索,上线两款新品

    Sina Finance. 淘宝内测 ai搜索,上线两款新品 . https://finance.sina.com.cn/ tech/it/2025-09-12/doc-infqfwzc7426261.shtml , September 2025. Accessed: 2026-06-01

  14. [14]

    小红书站内开测ai搜索功能,并已上线独立 app

    3E Life. 小红书站内开测ai搜索功能,并已上线独立 app. https://www.3elife.net/A rt/internet/202501/05/100181.html, January 2025. Accessed: 2026-06-01

  15. [15]

    A survey on large language models for recommendation.arXiv preprint arXiv:2305.19860, 2023

    Likang Wu, Zhi Zheng, Zhaopeng Qiu, Hao Wang, Hongchao Gu, Tingjia Shen, Chuan Qin, Chen Zhu, Hengshu Zhu, Qi Liu, Hui Xiong, and Enhong Chen. A survey on large language models for recommendation.arXiv preprint arXiv:2305.19860, 2023

  16. [16]

    A survey of large language model empowered agents for recommenda- tion and search.arXiv preprint arXiv:2503.05659, 2025

    Yizhe Zhang et al. A survey of large language model empowered agents for recommenda- tion and search.arXiv preprint arXiv:2503.05659, 2025

  17. [17]

    arXiv preprint arXiv:2303.14524 , year=

    Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, and Jiawei Zhang. Chat- rec: Towards interactive and explainable llms-augmented recommender system.arXiv preprint arXiv:2303.14524, 2023. 35

  18. [18]

    Recommender ai agent: Integrating large language models for interactive recommendations.arXiv preprint arXiv:2308.16505, 2023

    Xu Huang, Jianxun Lian, Yuxuan Lei, Jing Yao, Defu Lian, and Xing Xie. Recommender ai agent: Integrating large language models for interactive recommendations.arXiv preprint arXiv:2308.16505, 2023

  19. [19]

    Recmind: Large language model powered agent for recommendation

    Yancheng Wang, Ziyan Jiang, Zheng Chen, Fan Yang, Yingxue Zhou, Eunah Cho, Xing Fan, Xiaojiang Huang, Yanbin Lu, and Yingzhen Yang. Recmind: Large language model powered agent for recommendation. InFindings of the Association for Computational Linguistics: NAACL 2024, 2024

  20. [20]

    Recai: Lever- aging large language models for next-generation recommender systems.arXiv preprint arXiv:2403.06465, 2024

    Jianxun Lian, Yuxuan Lei, Xu Huang, Jing Yao, Wei Xu, and Xing Xie. Recai: Lever- aging large language models for next-generation recommender systems.arXiv preprint arXiv:2403.06465, 2024

  21. [21]

    Retrieval-augmented conversational recommendation with prompt-based semi-structured natural language state tracking

    Sara Kemper, Justin Cui, Kai Dicarlantonio, Kathy Lin, Danjie Tang, Anton Korikov, and Scott Sanner. Retrieval-augmented conversational recommendation with prompt-based semi-structured natural language state tracking. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

  22. [22]

    Recgpt technical report.arXiv preprint arXiv:2507.22879, 2025

    Chao Yi et al. Recgpt technical report.arXiv preprint arXiv:2507.22879, 2025

  23. [23]

    Recgpt-v2 technical report.arXiv preprint arXiv:2512.14503, 2025

    Chao Yi et al. Recgpt-v2 technical report.arXiv preprint arXiv:2512.14503, 2025

  24. [24]

    Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q

    Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan H. Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q. Tran, Jonah Samost, Maciej Kula, Ed H. Chi, and Maheswaran Sathiamoorthy. Recommender systems with generative retrieval. In Advances in Neural Information Processing Systems, 2023

  25. [25]

    Adapting large language models by integrating collaborative semantics for recommendation

    Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, and Ji-Rong Wen. Adapting large language models by integrating collaborative semantics for recommendation. In2024 IEEE 40th International Conference on Data Engineering, pages 1435–1448, 2024

  26. [26]

    Learnable item tokenization for generative recommendation

    Wenjie Wang, Honghui Bao, Xinyu Lin, Jizhi Zhang, Yongqi Li, Fuli Feng, See-Kiong Ng, and Tat-Seng Chua. Learnable item tokenization for generative recommendation. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, pages 2400–2409, 2024

  27. [27]

    Generative recommender with end-to-end learnable item tokenization

    Enze Liu, Bowen Zheng, Cheng Ling, Lantao Hu, Han Li, and Wayne Xin Zhao. Generative recommender with end-to-end learnable item tokenization. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 729–739, 2025

  28. [28]

    FORGE: Forming Semantic Identifiers for Generative Retrieval in Industrial Datasets

    Kairui Fu, Tao Zhang, Shuwen Xiao, Ziyang Wang, Xinming Zhang, Chenchi Zhang, Yuliang Yan, Junjun Zheng, Yu Li, Zhihong Chen, Jian Wu, Xiangheng Kong, Shengyu Zhang, Kun Kuang, Yuning Jiang, and Bo Zheng. Forge: Forming semantic identifiers for generative retrieval in industrial datasets.arXiv preprint arXiv:2509.20904, 2025

  29. [29]

    Onerec technical report.arXiv preprint arXiv:2506.13695, 2025

    Guorui Zhou et al. Onerec technical report.arXiv preprint arXiv:2506.13695, 2025

  30. [30]

    OneRec-V2 Technical Report

    Guorui Zhou et al. Onerec-v2 technical report.arXiv preprint arXiv:2508.20900, 2025

  31. [31]

    Openonerec technical report.arXiv preprint arXiv:2512.24762, 2025a

    Guorui Zhou, Honghui Bao, et al. Openonerec technical report.arXiv preprint arXiv:2512.24762, 2025. 36

  32. [32]

    Neural re-ranking in multi-stage recommender systems: A review

    Zhanyu Liu, Shiyao Wang, Xingmei Wang, Rongzhou Zhang, Jiaxin Deng, Honghui Bao, Jinghao Zhang, Wuchao Li, Pengfei Zheng, Xiangyu Wu, Yifei Hu, Qigen Hu, Xinchen Luo, Lejian Ren, Zixing Zhang, Qianqian Wang, Kuo Cai, Yunfan Wu, Hongtao Cheng, Zexuan Cheng, Lu Ren, Huanjie Wang, Yi Su, Ruiming Tang, Kun Gai, and Guorui Zhou. Onerec- think: In-text reasonin...

  33. [33]

    G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

    Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, and Chenguang Zhu. G-eval: Nlg evaluation using gpt-4 with better human alignment.arXiv preprint arXiv:2303.16634, 2023

  34. [34]

    Xing, Hao Zhang, Joseph E

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P . Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. Judging llm-as-a-judge with mt-bench and chatbot arena. InAdvances in Neural Information Processing Systems, 2023

  35. [35]

    Prometheus 2: An open source language model specialized in evaluating other language models

    Seungone Kim, Juyoung Suk, Shayne Longpre, Bill Yuchen Lin, Jamin Shin, Sean Welleck, Graham Neubig, Moontae Lee, Kyungjae Lee, and Minjoon Seo. Prometheus 2: An open source language model specialized in evaluating other language models. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

  36. [36]

    Anisha Gunjal, Anthony Wang, Elaine Lau, Vaskar Nath, Yunzhong He, Bing Liu, and Sean M. Hendryx. Rubrics as rewards: Reinforcement learning beyond verifiable domains. InInternational Conference on Learning Representations, 2026. URL https://openreview.n et/forum?id=c1bTcrDmt4

  37. [37]

    Mmlu-pro: A more robust and challenging multi-task language understanding benchmark.Advances in Neural Information Processing Systems, 2024

    Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, Tianle Li, Max Ku, Kai Wang, Alex Zhuang, Rongqi Fan, Xiang Yue, and Wenhu Chen. Mmlu-pro: A more robust and challenging multi-task language understanding benchmark.Advances in Neural Information Processing Systems, 2024

  38. [38]

    Cmmlu: Measuring massive multitask language understanding in chinese

    Haonan Li, Yixuan Zhang, Fajri Koto, Yifei Yang, Hai Zhao, Yeyun Gong, Nan Duan, and Timothy Baldwin. Cmmlu: Measuring massive multitask language understanding in chinese. InFindings of the Association for Computational Linguistics: ACL 2024, 2024

  39. [39]

    Instruction-Following Evaluation for Large Language Models

    Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, and Le Hou. Instruction-following evaluation for large language models. arXiv preprint arXiv:2311.07911, 2023

  40. [40]

    Le, Ed H

    Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V . Le, Ed H. Chi, Denny Zhou, and Jason Wei. Challenging big-bench tasks and whether chain-of-thought can solve them.Transactions on Machine Learning Research, 2023

  41. [41]

    David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, and Samuel R. Bowman. Gpqa: A graduate-level google-proof q&a benchmark. InFirst Conference on Language Modeling, 2024

  42. [42]

    Measuring mathematical problem solving with the math dataset

    Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the math dataset. InAdvances in Neural Information Processing Systems, 2021. 37

  43. [43]

    Training Verifiers to Solve Math Word Problems

    Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021

  44. [44]

    Program Synthesis with Large Language Models

    Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc V . Le, and Charles Sutton. Program synthesis with large language models.arXiv preprint arXiv:2108.07732, 2021

  45. [45]

    Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation

    Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. InAdvances in Neural Information Processing Systems, 2023

  46. [46]

    Qwen3-VL Technical Report

    Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-VL technical report.arXiv preprint arXiv:2511.21631, 2025

  47. [47]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceedings of the 38th International Conference on Machine Learning, 2021

  48. [48]

    BERT: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

  49. [49]

    BLIP-2: Bootstrapping language- image pre-training with frozen image encoders and large language models

    Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. BLIP-2: Bootstrapping language- image pre-training with frozen image encoders and large language models. InProceedings of the 40th International Conference on Machine Learning, 2023

  50. [50]

    Representation Learning with Contrastive Predictive Coding

    Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

  51. [51]

    Towards noise contrastive estimation with soft targets for conditional models.arXiv preprint arXiv:2404.14076, 2024

    Johannes Hugger and Virginie Uhlmann. Towards noise contrastive estimation with soft targets for conditional models.arXiv preprint arXiv:2404.14076, 2024

  52. [52]

    ColBERT: Efficient and effective passage search via contextualized late interaction over BERT

    Omar Khattab and Matei Zaharia. ColBERT: Efficient and effective passage search via contextualized late interaction over BERT. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 39–48, 2020

  53. [53]

    MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction

    Zilin Xiao, Qi Ma, Mengting Gu, Chun-cheng Jason Chen, Xintao Chen, Vicente Ordonez, and Vijai Mohan. MetaEmbed: Scaling multimodal retrieval at test-time with flexible late interaction.arXiv preprint arXiv:2509.18095, 2025

  54. [54]

    Neural discrete representa- tion learning

    Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representa- tion learning. InAdvances in Neural Information Processing Systems, 2017

  55. [55]

    Autoregressive image generation using residual quantization

    Doyup Lee, Chiheon Kim, Saehoon Kim, Minsu Cho, and Wook-Shin Han. Autoregressive image generation using residual quantization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

  56. [56]

    Improved Residual Vector Quantization for High-dimensional Approximate Nearest Neighbor Search

    Shicong Liu, Hongtao Lu, and Junru Shao. Improved residual vector quantization for high-dimensional approximate nearest neighbor search.arXiv preprint arXiv:1509.05195, 2015. 38

  57. [57]

    OneSearch: A preliminary exploration of the unified end-to-end generative framework for e-commerce search.arXiv preprint arXiv:2509.03236, 2025

    Ben Chen, Xian Guo, Siyuan Wang, Zihan Liang, Yue Lv, Yufei Ma, Xinlong Xiao, Bowen Xue, Xuxin Zhang, Ying Yang, Huangyu Dai, Xing Xu, Tong Zhao, Mingcan Peng, XiaoYang Zheng, Cong Zhang, Qihang Zhao, Yuqing Ding, Chenyi Lei, Wenwu Ou, and Han Li. OneSearch: A preliminary exploration of the unified end-to-end generative framework for e-commerce search.arX...

  58. [58]

    The Llama 3 Herd of Models

    Aaron Grattafiori et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

  59. [59]

    MiMo-V2-Flash Technical Report

    Xiaomi MiMo Team. Mimo-v2-flash technical report.arXiv preprint arXiv:2601.02780, 2026

  60. [60]

    Deepseek-v4 technical report

    DeepSeek-AI. Deepseek-v4 technical report. https://huggingface.co/deepseek-a i/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf, 2026. Accessed: 2026-06-03

  61. [61]

    Nemotron-cascade 2: Post-training llms with cascade rl and multi-domain on-policy distillation.arXiv preprint arXiv:2603.19220, 2026

    NVIDIA Nemotron Team. Nemotron-cascade 2: Post-training llms with cascade rl and multi-domain on-policy distillation.arXiv preprint arXiv:2603.19220, 2026

  62. [62]

    Reinforcement learning optimization for large- scale learning: An efficient and user-friendly scaling library.arXiv preprint arXiv:2506.06122, 2025

    Weixun Wang, Shaopan Xiong, Gengru Chen, Wei Gao, Sheng Guo, Yancheng He, Ju Huang, Jiaheng Liu, Zhendong Li, Xiaoyang Li, et al. Reinforcement learning optimization for large- scale learning: An efficient and user-friendly scaling library.arXiv preprint arXiv:2506.06122, 2025

  63. [63]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

  64. [64]

    Is chatgpt good at search? investigating large language models as re-ranking agents

    Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, and Zhaochun Ren. Is chatgpt good at search? investigating large language models as re-ranking agents. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

  65. [65]

    Zero-shot listwise document reranking with a large language model.arXiv preprint arXiv:2305.02156, 2023

    Xueguang Ma, Xinyu Zhang, Ronak Pradeep, and Jimmy Lin. Zero-shot listwise document reranking with a large language model.arXiv preprint arXiv:2305.02156, 2023

  66. [66]

    Llm4rerank: Llm-based auto-reranking framework for recommendations.arXiv preprint arXiv:2406.12433, 2024

    Jingtong Gao, Bo Chen, Weiwen Liu, Xiangyang Li, Yichao Wang, Wanyu Wang, Huifeng Guo, Ruiming Tang, and Xiangyu Zhao. Llm4rerank: Llm-based auto-reranking framework for recommendations.arXiv preprint arXiv:2406.12433, 2024

  67. [67]

    Care: Contextual adaptation of recommenders for llm-based conversational recommendation

    Chuang Li, Yang Deng, Hengchang Hu, See-Kiong Ng, Min-Yen Kan, and Haizhou Li. Care: Contextual adaptation of recommenders for llm-based conversational recommendation. arXiv preprint arXiv:2508.13889, 2025

  68. [68]

    Llada-rec: Discrete diffusion for parallel semantic id generation in generative recommendation.arXiv preprint arXiv:2511.06254, 2025

    Teng Shi, Chenglei Shen, Weijie Yu, Shen Nie, Chongxuan Li, Xiao Zhang, Ming He, Yan Han, and Jun Xu. Llada-rec: Discrete diffusion for parallel semantic id generation in generative recommendation.arXiv preprint arXiv:2511.06254, 2025

  69. [69]

    Content-based collabo- rative generation for recommender systems

    Yidan Wang, Zhaochun Ren, Weiwei Sun, Jiyuan Yang, Zhixiang Liang, Xin Chen, Ruobing Xie, Su Yan, Xu Zhang, Pengjie Ren, Zhumin Chen, and Xin Xin. Content-based collabo- rative generation for recommender systems. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management, pages 2420–2430, 2024

  70. [70]

    Tokenrec: Learning to tokenize id for llm- based generative recommendations.IEEE Transactions on Knowledge and Data Engineering, 2025

    Haohao Qu, Wenqi Fan, Zihuai Zhao, and Qing Li. Tokenrec: Learning to tokenize id for llm- based generative recommendations.IEEE Transactions on Knowledge and Data Engineering, 2025. 39

  71. [71]

    Order-agnostic identifier for large language model-based generative recommenda- tion

    Xinyu Lin, Haihan Shi, Wenjie Wang, Fuli Feng, Qifan Wang, See-Kiong Ng, and Tat-Seng Chua. Order-agnostic identifier for large language model-based generative recommenda- tion. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1923–1933, 2025

  72. [72]

    Universal item tokenization for transferable generative recommendation.arXiv preprint arXiv:2504.04405, 2025

    Bowen Zheng, Hongyu Lu, Yu Chen, Wayne Xin Zhao, and Ji-Rong Wen. Universal item tokenization for transferable generative recommendation.arXiv preprint arXiv:2504.04405, 2025

  73. [73]

    Bbqrec: Behavior-bind quantization for multi-modal sequential recommendation.arXiv preprint arXiv:2504.06636, 2025

    Kaiyuan Li, Rui Xiang, Yong Bai, Yongxiang Tang, Yanhua Cheng, Xialong Liu, Peng Jiang, and Kun Gai. Bbqrec: Behavior-bind quantization for multi-modal sequential recommendation.arXiv preprint arXiv:2504.06636, 2025

  74. [74]

    [omitted]

    Yupeng Hou, Jiacheng Li, Ashley Shin, Jinsung Jeon, Abhishek Santhanam, Wei Shao, Kaveh Hassani, Ning Yao, and Julian McAuley. Generating long semantic ids in parallel for recommendation. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 956–966, 2025. 40 Appendix A. Author List Core Contributors Jiacheng Chen∗ ...