pith. machine review for the scientific record. sign in

arxiv: 2605.11553 · v1 · submitted 2026-05-12 · 💻 cs.IR

Recognition: 2 theorem links

· Lean Theorem

TwiSTAR:Think Fast, Think Slow, Then Act,Generative Recommendation with Adaptive Reasoning

Authors on Pith no claims yet

Pith reviewed 2026-05-13 01:36 UTC · model grok-4.3

classification 💻 cs.IR
keywords generative recommendationadaptive reasoningsemantic IDschain-of-thoughtreinforcement learningplannercommonsense explanations
0
0 comments X

The pith

A planner learns to invoke slow reasoning only for hard user histories in generative recommendation, raising accuracy while lowering latency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Generative recommenders that rely on Semantic IDs currently apply either fast direct generation or full chain-of-thought reasoning to every user sequence, creating a fixed trade-off between speed and quality. The paper introduces an adaptive system that equips an LLM with three tools—a fast retriever, a lightweight ranker, and a slow model that first produces explicit item-to-item commonsense explanations—and trains a planner to choose the right tool per sequence. The planner is first warmed up with supervision and then refined through agentic reinforcement learning. Experiments across three datasets show the approach beats uniform fast and uniform slow baselines by delivering higher accuracy at reduced average inference cost. A sympathetic reader would care because it points toward practical LLM-based systems that spend compute where it actually improves outcomes rather than everywhere.

Core claim

The central claim is that equipping a generative recommender with a fast SID retriever, a candidate ranker, and a slow reasoning model that converts collaborative item-to-item knowledge into natural-language rationales, then training a planner via supervised warm-up and agentic RL to decide which tool to call for each user sequence, produces both higher accuracy and lower latency than any single fixed strategy applied uniformly.

What carries the argument

The planner, trained through supervised warm-up followed by agentic reinforcement learning, that dynamically selects among the fast SID-based retriever, lightweight ranker, or slow reasoning model with injected commonsense explanations.

Load-bearing premise

The planner can reliably detect which user sequences benefit from slow reasoning and that the injected item-to-item commonsense explanations remain useful across different datasets.

What would settle it

An evaluation on the same three datasets in which the adaptive planner produces no accuracy gain over the best fixed-strategy baseline or fails to reduce average latency would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.11553 by Kaian Jiang, Shiteng Cao, Yunlong Gong, Zhiheng Li.

Figure 1
Figure 1. Figure 1: Illustration of fast vs. slow reasoning in generative recommendation. Left: fixed separate [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Our framework first extracts SIDs from item metadata and aligns them with text em [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Recall@10, NDCG@10 and relative inference cost (normalized to the full model) of [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Generative recommendation with Semantic IDs (SIDs) has emerged as a promising paradigm, yet existing methods apply a fixed inference strategy, either fast direct generation or slow chain-of-thought reasoning, uniformly across all user histories. This approach creates a trade-off: fast recommendation model produces suboptimal accuracy on hard samples, while always invoking slow reasoning incurs prohibitive latency and wastes computation on easy cases. To address this, we propose Think Fast, Think Slow, Then Act, a framework that learns to adaptively allocate reasoning effort per user sequence. Our system equips an LLM with three complementary tools: a fast SID-based retriever, a lightweight candidate ranker, and a slow reasoning model that generates explicit rationales before recommending. Crucially, we inject collaborative commonsense into the slow model by transforming item-to-item knowledge into natural language explanations. A planner, trained through supervised warm-up followed by agentic reinforcement learning, dynamically decides which tool to invoke. Experiments on three datasets demonstrate that our method outperforms strong baselines, achieving consistent accuracy gains while reducing inference latency compared to uniform slow reasoning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces TwiSTAR, a generative recommendation system using Semantic IDs (SIDs) that equips an LLM with three tools—a fast SID retriever, a lightweight ranker, and a slow reasoning model augmented with natural-language item-to-item commonsense explanations. A planner, trained first by supervised warm-up and then by agentic reinforcement learning, adaptively selects which tool to invoke per user sequence. The central empirical claim is that this adaptive allocation yields consistent accuracy gains over strong baselines while reducing inference latency relative to always invoking the slow reasoning path, demonstrated on three datasets.

Significance. If the planner reliably routes hard sequences to slow reasoning and the commonsense explanations measurably improve the slow path, the framework offers a practical way to resolve the accuracy–latency trade-off in LLM-based generative recommenders. The combination of tool use, commonsense injection, and agentic RL for routing is a coherent extension of recent work on adaptive inference. However, the absence of direct validation for the planner’s decisions and the contribution of the injected explanations prevents a clear assessment of whether the reported gains are attributable to the adaptive mechanism itself.

major comments (3)
  1. [Abstract / Experiments] Abstract and Experiments section: the claim of 'consistent accuracy gains' and 'reducing inference latency' is presented without any numerical results, error bars, statistical tests, per-dataset breakdowns, or description of how the planner was evaluated. This omission makes the central empirical claim unverifiable from the provided text.
  2. [Experiments] Experiments section: no ablation is reported that disables the planner (e.g., uniform slow reasoning, random routing, or always-fast) or removes the commonsense injection. Without these controls it is impossible to determine whether accuracy improvements arise from adaptive allocation or simply from the presence of multiple tools.
  3. [Method] Method / Planner subsection: the manuscript provides no planner decision statistics (percentage of sequences routed to slow reasoning, agreement with an oracle that knows when slow reasoning helps, or routing accuracy on held-out data). This leaves the weakest assumption—that the planner reliably identifies sequences requiring slow reasoning—unsupported by direct evidence.
minor comments (2)
  1. [Title] The title contains a missing space after the colon ('TwiSTAR:Think Fast').
  2. [Abstract] The abstract refers to 'three complementary tools' but does not explicitly list the lightweight candidate ranker in the final sentence; a brief clarification would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on TwiSTAR. The comments highlight important areas for strengthening the empirical validation of our adaptive reasoning framework. We address each major comment below and will incorporate revisions to improve clarity and evidence.

read point-by-point responses
  1. Referee: [Abstract / Experiments] Abstract and Experiments section: the claim of 'consistent accuracy gains' and 'reducing inference latency' is presented without any numerical results, error bars, statistical tests, per-dataset breakdowns, or description of how the planner was evaluated. This omission makes the central empirical claim unverifiable from the provided text.

    Authors: We agree that the abstract presents high-level claims. The Experiments section reports per-dataset accuracy and latency results across three datasets, but lacks explicit error bars, statistical tests, and a detailed planner evaluation description in the main text. In revision, we will update the abstract with key numerical highlights (e.g., relative gains), add error bars and significance tests to experimental tables, include per-dataset breakdowns with planner routing details, and expand the planner evaluation description to make all claims directly verifiable. revision: yes

  2. Referee: [Experiments] Experiments section: no ablation is reported that disables the planner (e.g., uniform slow reasoning, random routing, or always-fast) or removes the commonsense injection. Without these controls it is impossible to determine whether accuracy improvements arise from adaptive allocation or simply from the presence of multiple tools.

    Authors: This is a valid concern. Our experiments compare against fixed fast and slow baselines, but do not include explicit ablations for random routing or commonsense removal. We will add these controls in the revised Experiments section: always-fast, uniform slow reasoning, random tool selection, and slow reasoning without commonsense explanations. These will isolate the planner's adaptive contribution and the value of injected explanations. revision: yes

  3. Referee: [Method] Method / Planner subsection: the manuscript provides no planner decision statistics (percentage of sequences routed to slow reasoning, agreement with an oracle that knows when slow reasoning helps, or routing accuracy on held-out data). This leaves the weakest assumption—that the planner reliably identifies sequences requiring slow reasoning—unsupported by direct evidence.

    Authors: We acknowledge that direct planner diagnostics are missing. The manuscript emphasizes end-to-end results but omits routing statistics. In revision, we will add a dedicated analysis with the percentage of sequences routed to slow reasoning, correlation with sequence difficulty, and routing accuracy or oracle agreement metrics on held-out data where available. This will provide direct evidence supporting the planner's decisions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical framework with no derivations or self-referential reductions

full rationale

The paper describes an engineering framework for adaptive tool use in generative recommendation (fast SID retriever, ranker, slow reasoning model with injected commonsense) and a planner trained via supervised warm-up plus agentic RL. No equations, first-principles derivations, or closed-form predictions appear in the provided text. Central claims rest on empirical comparisons across three datasets rather than any quantity that reduces to its own fitted inputs by construction. No self-citations are invoked to justify uniqueness theorems, ansatzes, or load-bearing premises. The planner's routing behavior and commonsense injection are presented as design choices whose value is asserted via experiment, not via definitional equivalence. This is a standard non-circular empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The abstract implies several unstated modeling choices: that Semantic IDs are already effective, that item-to-item knowledge can be reliably turned into natural-language explanations, and that the RL reward signal for the planner is well-defined. No explicit free parameters or invented entities are named.

axioms (2)
  • domain assumption LLMs can usefully generate explicit rationales for item recommendations when given collaborative patterns in natural language
    Invoked when describing the slow reasoning model and commonsense injection
  • domain assumption A planner trained with supervised warm-up plus agentic RL can learn to allocate fast versus slow tools without excessive overhead
    Central to the adaptive mechanism

pith-pipeline@v0.9.0 · 5496 in / 1405 out tokens · 31192 ms · 2026-05-13T01:36:44.050467+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 6 internal anchors

  1. [1]

    LLaRA: Large language-recommendation assistant

    Jiayi Liao, Sihang Li, Zhengyi Yang, Jiancan Wu, Yancheng Yuan, Xiang Wang, and Xi- angnan He. LLaRA: Large language-recommendation assistant. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’24, page 1785–1795, New York, NY , USA, 2024. Association for Com- puting Machinery. ISBN 97...

  2. [2]

    Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models, 2023. URLhttps://arxiv.org/abs/2201.11903

  3. [3]

    OneRec- Think: In-text reasoning for generative recommendation, 2025

    Zhanyu Liu, Shiyao Wang, Xingmei Wang, Rongzhou Zhang, Jiaxin Deng, Honghui Bao, Jinghao Zhang, Wuchao Li, Pengfei Zheng, Xiangyu Wu, Yifei Hu, Qigen Hu, Xinchen Luo, Lejian Ren, Zixing Zhang, Qianqian Wang, Kuo Cai, Yunfan Wu, Hongtao Cheng, Zexuan Cheng, Lu Ren, Huanjie Wang, Yi Su, Ruiming Tang, Kun Gai, and Guorui Zhou. OneRec- Think: In-text reasonin...

  4. [4]

    Generative reasoning recommendation via LLMs, 2025

    Minjie Hong, Zetong Zhou, Zirun Guo, Ziang Zhang, Ruofan Hu, Weinan Gan, Jieming Zhu, and Zhou Zhao. Generative reasoning recommendation via LLMs, 2025. URL https: //arxiv.org/abs/2510.20815

  5. [5]

    OxygenREC: An instruction-following generative framework for e-commerce recommendation

    Xuegang Hao, Ming Zhang, Alex Li, Xiangyu Qian, Zhi Ma, Yanlong Zang, Shijie Yang, Zhongxuan Han, Xiaolong Ma, Jinguang Liu, Zhen Li, Zhida Jiang, Shusheng Wang, Ning Tang, Yanchen Qiao, Chenxiang Yang, Chen Sun, Jincheng Yuan, Chunhua Peng, Heng Hu, Peijun Yang, Baopeng Yuan, Caiyun Qiu, Zhaolong Xing, Haofei Yuan, Haipeng Zhang, Yuzhang Guo, Weijie Ding...

  6. [6]

    Qwen3 Technical Report

    An Yang et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

  7. [7]

    Siavash Ameli, Siyuan Zhuang, Ion Stoica, and Michael W

    Marah Abdin, Sahaj Agarwal, Ahmed Awadallah, Vidhisha Balachandran, Harkirat Behl, Lingjiao Chen, Gustavo de Rosa, Suriya Gunasekar, Mojan Javaheripi, Neel Joshi, Piero Kauffmann, Yash Lara, Caio César Teodoro Mendes, Arindam Mitra, Besmira Nushi, Dim- itris Papailiopoulos, Olli Saarikivi, Shital Shah, Vaishnavi Shrivastava, Vibhav Vineet, Yue Wu, Safoora...

  8. [8]

    Claude 3.7 Sonnet and Claude Code

    Anthropic. Claude 3.7 Sonnet and Claude Code. https://www.anthropic.com/news/ claude-3-7-sonnet, 2025. Accessed: 2026-04-29

  9. [9]

    CoRR , volume =

    Zeyu Cui, Jianxin Ma, Chang Zhou, Jingren Zhou, and Hongxia Yang. M6-Rec: Generative pretrained language models are open-ended recommender systems, 2022. URL https:// arxiv.org/abs/2205.08084

  10. [10]

    Tallrec: An effective and efficient tuning framework to align large language model with recommendation

    Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. TALLRec: An effective and efficient tuning framework to align large language model with recommen- dation. InProceedings of the 17th ACM Conference on Recommender Systems, RecSys ’23, page 1007–1014, New York, NY , USA, 2023. Association for Computing Machinery. ISBN 9798400702419...

  11. [11]

    LLMTreeRec: Unleashing the power of large language models for cold-start recommendations, 2024

    Wenlin Zhang, Chuhan Wu, Xiangyang Li, Yuhao Wang, Kuicai Dong, Yichao Wang, Xinyi Dai, Xiangyu Zhao, Huifeng Guo, and Ruiming Tang. LLMTreeRec: Unleashing the power of large language models for cold-start recommendations, 2024. URL https://arxiv.org/ abs/2404.00702

  12. [12]

    A bi-step grounding paradigm for large language models in recommendation systems.ACM Trans

    Keqin Bao, Jizhi Zhang, Wenjie Wang, Yang Zhang, Zhengyi Yang, Yanchen Luo, Chong Chen, Fuli Feng, and Qi Tian. A bi-step grounding paradigm for large language models in recommendation systems.ACM Trans. Recomm. Syst., 3(4), April 2025. doi: 10.1145/3716393. URLhttps://doi.org/10.1145/3716393. 10

  13. [13]

    On softmax direct preference optimization for recommendation

    Yuxin Chen, Junfei Tan, An Zhang, Zhengyi Yang, Leheng Sheng, Enzhi Zhang, Xiang Wang, and Tat-Seng Chua. On softmax direct preference optimization for recommendation. In Proceedings of the 38th International Conference on Neural Information Processing Systems, NIPS ’24, Red Hook, NY , USA, 2024. Curran Associates Inc. ISBN 9798331314385

  14. [14]

    InteraRec: Interactive recommendations us- ing multimodal large language models

    Saketh Reddy Karra and Theja Tulabandhula. InteraRec: Interactive recommendations us- ing multimodal large language models. InTrends and Applications in Knowledge Discov- ery and Data Mining: PAKDD 2024 Workshops, RAFDA and IWTA, Taipei, Taiwan, May 7–10, 2024, Proceedings, page 32–43, Berlin, Heidelberg, 2024. Springer-Verlag. ISBN 978- 981-97-2649-3. do...

  15. [15]

    Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5),

    Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang. Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5),

  16. [16]

    URLhttps://arxiv.org/abs/2203.13366

  17. [17]

    arXiv:2308.10837 , year=

    Zhixuan Chu, Hongyan Hao, Xin Ouyang, Simeng Wang, Yan Wang, Yue Shen, Jinjie Gu, Qing Cui, Longfei Li, Siqiao Xue, James Y Zhang, and Sheng Li. Leveraging large language models for pre-trained recommender systems, 2023. URLhttps://arxiv.org/abs/2308.10837

  18. [18]

    Adapting large language models by integrating collaborative semantics for recommen- dation

    Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, and Ji-Rong Wen. Adapting large language models by integrating collaborative semantics for recommen- dation. In2024 IEEE 40th International Conference on Data Engineering (ICDE), pages 1435–1448, 2024. doi: 10.1109/ICDE60146.2024.00118

  19. [19]

    PLUM: Adapting pre-trained language models for industrial-scale generative recommendations

    Ruining He, Lukasz Heldt, Lichan Hong, Raghunandan Keshavan, Shifan Mao, Nikhil Mehta, Zhengyang Su, Alicia Tsai, Yueqi Wang, Shao-Chuan Wang, Xinyang Yi, Lexi Baugher, Baykal Cakici, Ed Chi, Cristos Goodrow, Ningren Han, He Ma, Romer Rosales, Abby Van Soest, Devansh Tandon, Su-Lin Wu, Weilong Yang, and Yilin Zheng. PLUM: Adapting pre-trained language mod...

  20. [20]

    Fine-grained semantics integration for large language model-based recommendation, 2026

    Jiawei Feng, Xiaoyu Kong, Leheng Sheng, Bin Wu, Chao Yi, Feifang Yang, Xiang-Rong Sheng, Han Zhu, Xiang Wang, Jiancan Wu, and Xiangnan He. Fine-grained semantics integration for large language model-based recommendation, 2026. URL https://arxiv.org/abs/2602. 22632

  21. [21]

    Reasoning over semantic IDs enhances generative recommenda- tion, 2026

    Yingzhi He, Yan Sun, Junfei Tan, Yuxin Chen, Xiaoyu Kong, Chunxu Shen, Xiang Wang, An Zhang, and Tat-Seng Chua. Reasoning over semantic IDs enhances generative recommenda- tion, 2026. URLhttps://arxiv.org/abs/2603.23183

  22. [22]

    arXiv:2411.13789 [cs.IR] https://arxiv.org/ abs/2411.13789

    Fengxin Li, Yi Li, Yue Liu, Chao Zhou, Yuan Wang, Xiaoxiang Deng, Wei Xue, Dapeng Liu, Lei Xiao, Haijie Gu, Jie Jiang, Hongyan Liu, Biao Qin, and Jun He. Leadre: Multi-faceted knowledge enhanced llm empowered display advertisement recommender system, 2025. URL https://arxiv.org/abs/2411.13789

  23. [23]

    Thomas, Alexandra Ranieri, Matthew N

    Edoardo D’Amico, Marco De Nadai, Praveen Chandar, Divita V ohra, Shawn Lin, Max Lefarov, Paul Gigioli, Gustavo Penha, Ilya Kopysitsky, Ivo Joel Senese, Darren Mei, Francesco Fab- bri, Oguz Semerci, Yu Zhao, Vincent Tang, Brian St. Thomas, Alexandra Ranieri, Matthew N. K. Smith, Aaron Bernkopf, Bryan Leung, Ghazal Fazelnia, Mark VanMiddlesworth, Timo- thy ...

  24. [24]

    Gould, Yves Raimond, Sandeep Ghael, Tony Jebara, 11 Andreas Damianou, Vladan Radosavljevic, Paul N

    Marco De Nadai, Edoardo D’Amico, Max Lefarov, Alexandre Tamborrino, Divita V ohra, Mark VanMiddlesworth, Shawn Lin, Jacqueline Wood, Jan Stypka, Eliza Klyce, Keshi Dai, Timothy Christopher Heath, Martin D. Gould, Yves Raimond, Sandeep Ghael, Tony Jebara, 11 Andreas Damianou, Vladan Radosavljevic, Paul N. Bennett, Mounia Lalmas, and Praveen Chandar. A unif...

  25. [25]

    Generative reasoning re-ranker, 2026

    Mingfu Liang, Yufei Li, Jay Xu, Kavosh Asadi, Xi Liu, Shuo Gu, Kaushik Rangadurai, Frank Shyu, Shuaiwen Wang, Song Yang, Zhijing Li, Jiang Liu, Mengying Sun, Fei Tian, Xiaohan Wei, Chonglin Sun, Jacob Tao, Shike Mei, Wenlin Chen, Santanu Kolay, Sandeep Pandey, Hamed Firooz, and Luke Simon. Generative reasoning re-ranker, 2026. URL https: //arxiv.org/abs/2...

  26. [26]

    Minh-Duc Nguyen, Hai-Dang Kieu, and Dung D. Le. AMEM4Rec: Leveraging cross-user sim- ilarity for memory evolution in agentic LLM recommenders.arXiv preprint arXiv:2602.08837, 2026

  27. [27]

    RecGPT-V2 technical report.arXiv preprint arXiv:2512.14503, 2025

    Chao Yi, Dian Chen, Gaoyang Guo, Jiakai Tang, Jian Wu, Jing Yu, Mao Zhang, Wen Chen, Wenjun Yang, Yujie Luo, Yuning Jiang, Zhujin Gao, Bo Zheng, Binbin Cao, Changfa Wu, Dixuan Wang, Han Wu, Haoyi Hu, Kewei Zhu, Lang Tian, Lin Yang, Qiqi Huang, Siqi Yang, Wenbo Su, Xiaoxiao He, Xin Tong, Xu Chen, Xunke Xi, Xiaowei Huang, Yaxuan Wu, Yeqiu Yang, Yi Hu, Yujin...

  28. [28]

    Recbot: Agent-based recommendation system.arXiv preprint arXiv:2509.21317,

    Jiakai Tang, Yujie Luo, Xunke Xi, Fei Sun, Xueyang Feng, Sunhao Dai, Chao Yi, Dian Chen, Zhujin Gao, Yang Li, Xu Chen, Wen Chen, Jian Wu, Yuning Jiang, and Bo Zheng. Interactive recommendation agent with active user commands.arXiv preprint arXiv:2509.21317, 2025

  29. [29]

    TalkPlay-Tools: Conversational music recommendation with LLM tool calling.arXiv preprint arXiv:2510.01698, 2025

    Seungheon Doh, Keunwoo Choi, and Juhan Nam. TalkPlay-Tools: Conversational music recommendation with LLM tool calling.arXiv preprint arXiv:2510.01698, 2025

  30. [30]

    Deep interest network for click-through rate prediction

    Guorui Zhou, Chengru Song, Xiaoqiang Zhu, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. Deep interest network for click-through rate prediction. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’18, pages 1059–1068, New York, NY , USA, 2018. Association for Computing Machi...

  31. [31]

    Qwen3.5: Towards native multimodal agents, February 2026

    Qwen Team. Qwen3.5: Towards native multimodal agents, February 2026. URL https: //qwen.ai/blog?id=qwen3.5

  32. [32]

    Image-based recommendations on styles and substitutes, 2015

    Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. Image-based recommendations on styles and substitutes, 2015. URL https://arxiv.org/abs/1506. 04757

  33. [33]

    Hierarchical gating networks for sequential recommenda- tion, 2019

    Chen Ma, Peng Kang, and Xue Liu. Hierarchical gating networks for sequential recommenda- tion, 2019. URLhttps://arxiv.org/abs/1906.09217

  34. [34]

    Session-based Recommendations with Recurrent Neural Networks

    Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. Session-based recommendations with recurrent neural networks, 2016. URL https://arxiv.org/abs/ 1511.06939

  35. [35]

    Self-attentive sequential recommendation, 2018

    Wang-Cheng Kang and Julian McAuley. Self-attentive sequential recommendation, 2018. URL https://arxiv.org/abs/1808.09781

  36. [36]

    Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q

    Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan H. Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q. Tran, Jonah Samost, Maciej Kula, Ed H. Chi, and Maheswaran Sathiamoorthy. Recommender systems with generative retrieval, 2023. URL https://arxiv.org/abs/2305.05065

  37. [37]

    Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

    Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhaojie Gong, Fangda Gu, Michael He, Yinghai Lu, and Yu Shi. Actions speak louder than words: Trillion-parameter sequential transducers for generative recommendations, 2024. URL https: //arxiv.org/abs/2402.17152

  38. [38]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019. URL https://arxiv.org/ abs/1810.04805. 12

  39. [39]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024. URL https://arxiv.org/abs/ 2402.03300. 13 A Detailed Reward Design for GRPO Training We describe the reward function used to train t...