pith. sign in

arxiv: 2601.18664 · v3 · pith:AFTSS7ELnew · submitted 2026-01-26 · 💻 cs.IR

S²GR: Stepwise Semantic-Guided Reasoning in Latent Space for Generative Recommendation

Pith reviewed 2026-05-21 14:18 UTC · model grok-4.3

classification 💻 cs.IR
keywords generative recommendationsemantic IDstepwise reasoningcontrastive learningcodebook optimizationlatent spacethinking tokensrecommendation systems
0
0 comments X

The pith

Inserting supervised thinking tokens before each semantic ID generation grounds reasoning paths and balances focus across codes in generative recommendation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to fix two shortcomings in current generative recommendation systems: strict separation of reasoning from generation creates uneven attention to different levels of semantic codes, and the reasoning itself lacks clear meaning or reliable checks. It does this by first building a codebook that captures item co-occurrences along with load-balancing and uniformity goals to form coarse-to-fine semantic layers. The key addition is a stepwise process that places thinking tokens ahead of each semantic ID generation step; these tokens are trained with contrastive learning to match ground-truth cluster distributions from the codebook. A sympathetic reader would care because the change promises more reliable internal steps that resemble how language models reason, leading to better recommendations on large platforms such as short-video services. If the approach holds, models could achieve higher quality outputs while keeping computation focused evenly across all code levels.

Core claim

The authors claim that after optimizing a codebook with co-occurrence, load balancing, and uniformity objectives to establish robust semantic hierarchies, the insertion of thinking tokens before each SID generation step, with each token explicitly representing coarse-grained semantics and supervised via contrastive learning against ground-truth codebook cluster distributions, produces physically grounded reasoning paths and ensures balanced computational focus across all SID codes.

What carries the argument

Stepwise reasoning mechanism that inserts thinking tokens representing coarse-grained semantics before each semantic ID (SID) generation step, supervised contrastively against codebook cluster distributions.

If this is right

  • Balanced computational focus is maintained across all hierarchical SID codes instead of degrading quality on later codes.
  • Reasoning paths become physically grounded and verifiable through alignment with codebook cluster distributions.
  • Codebook utilization improves via the added load balancing and uniformity objectives that reinforce coarse-to-fine hierarchies.
  • Performance gains appear in extensive offline experiments and are confirmed by online A/B tests on large-scale industrial platforms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same stepwise token insertion could be tested in other sequence-generation settings within information retrieval to add interpretable checkpoints without extra model size.
  • Supervision drawn from co-occurrence-based codebook clusters may offer a route to more stable handling of items with sparse interaction data.
  • The emphasis on even focus across code levels suggests a possible reduction in overfitting when generating long hierarchical identifiers.

Load-bearing premise

Contrastive supervision of thinking tokens against ground-truth codebook cluster distributions will yield physically grounded reasoning paths rather than simply fitting patterns already captured by the codebook optimization.

What would settle it

An ablation experiment that removes the contrastive supervision on the thinking tokens and measures whether recommendation accuracy drops or reasoning paths lose interpretability would directly test the claim.

Figures

Figures reproduced from arXiv: 2601.18664 by Jian Wang, Jiawei Guo, Jun Zhao, Kaiqiao Zhan, Ruxin Zhou, Xiaoxiao Xu, Yongqi Liu, Youhua Liu, Zihao Guo.

Figure 1
Figure 1. Figure 1: The illustrations of latent reasoning paradigms. The left and middle subfigures show the processes in sequential [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Framework of Collaborative and Balanced RQ-VAE. We first construct an item co-occurrence graph based on users’ [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Framework of Stepwise Semantic-Guided Reasoning. We interleave stepwise thinking tokens within the semantic ID [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The performance variation across different cluster [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: The performance comparison under different model [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

Generative Recommendation (GR) has emerged as a transformative paradigm with its end-to-end generation advantages. However, existing GR methods primarily focus on direct Semantic ID (SID) generation from interaction sequences, failing to activate deeper reasoning capabilities analogous to those in large language models and thus limiting performance potential. We identify two critical limitations in current reasoning-enhanced GR approaches: (1) Strict sequential separation between reasoning and generation steps creates imbalanced computational focus across hierarchical SID codes, degrading quality for SID codes; (2) Generated reasoning vectors lack interpretable semantics, while reasoning paths suffer from unverifiable supervision. In this paper, we propose stepwise semantic-guided reasoning in latent space (S$^2$GR), a novel reasoning enhanced GR framework. First, we establish a robust semantic foundation via codebook optimization, integrating item co-occurrence relationship to capture behavioral patterns, and load balancing and uniformity objectives that maximize codebook utilization while reinforcing coarse-to-fine semantic hierarchies. Our core innovation introduces the stepwise reasoning mechanism inserting thinking tokens before each SID generation step, where each token explicitly represents coarse-grained semantics supervised via contrastive learning against ground-truth codebook cluster distributions ensuring physically grounded reasoning paths and balanced computational focus across all SID codes. Extensive experiments demonstrate the superiority of S$^2$GR, and online A/B test confirms efficacy on large-scale industrial short video platform.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes S²GR, a generative recommendation framework that first optimizes a codebook for Semantic IDs (SIDs) using item co-occurrence, load balancing, and uniformity objectives to capture behavioral patterns and coarse-to-fine hierarchies. Its core contribution is a stepwise reasoning mechanism that inserts thinking tokens before each SID generation step; these tokens represent coarse-grained semantics and are trained via contrastive learning against ground-truth codebook cluster distributions, with the goal of producing physically grounded reasoning paths and balanced computational focus across hierarchical SID codes. Experiments and an online A/B test on a short-video platform are claimed to demonstrate superiority over prior GR methods.

Significance. If the contrastive supervision of thinking tokens can be shown to yield independently verifiable reasoning steps that are not reducible to the codebook's own fitted distributions, the work would meaningfully extend generative recommendation by importing LLM-style stepwise reasoning into latent semantic ID generation, potentially improving handling of hierarchical semantics in large-scale industrial settings.

major comments (1)
  1. [Core innovation description] The core innovation paragraph (stepwise reasoning mechanism): the claim that contrastive supervision against ground-truth codebook cluster distributions produces 'physically grounded' and 'verifiable' reasoning paths is load-bearing for the central contribution, yet these targets are themselves outputs of the preceding codebook optimization (co-occurrence + load-balancing + uniformity on the same interaction sequences). This creates a circularity risk in which the thinking tokens may simply regularize reproduction of already-captured cluster assignments rather than introduce new, independently interpretable reasoning steps. An ablation isolating the incremental value of the thinking-token supervision over direct SID generation, or a qualitative analysis of whether the learned tokens yield human-interpretable semantics not already present in the codebook, is required.
minor comments (2)
  1. The abstract states that 'extensive experiments demonstrate superiority' but provides no quantitative metrics, baselines, or effect sizes; adding a brief results summary would improve readability.
  2. Notation for 'thinking tokens' and 'SID codes' is introduced without an explicit equation or diagram reference in the provided description; a small illustrative figure or equation defining the contrastive loss would clarify the mechanism.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and detailed analysis of our core contribution. We address the major comment point by point below, providing clarification on the distinction between codebook optimization and stepwise reasoning while committing to revisions that empirically demonstrate the incremental value of our approach.

read point-by-point responses
  1. Referee: The core innovation paragraph (stepwise reasoning mechanism): the claim that contrastive supervision against ground-truth codebook cluster distributions produces 'physically grounded' and 'verifiable' reasoning paths is load-bearing for the central contribution, yet these targets are themselves outputs of the preceding codebook optimization (co-occurrence + load-balancing + uniformity on the same interaction sequences). This creates a circularity risk in which the thinking tokens may simply regularize reproduction of already-captured cluster assignments rather than introduce new, independently interpretable reasoning steps. An ablation isolating the incremental value of the thinking-token supervision over direct SID generation, or a qualitative analysis of whether the learned tokens yield human-interpretable semantics not already present in the codebook, is required.

    Authors: We acknowledge the referee's concern about potential circularity and agree that explicit clarification is warranted. The codebook is optimized in an offline stage to derive static Semantic IDs and cluster distributions from co-occurrence patterns. In contrast, the thinking tokens are inserted and trained dynamically during the autoregressive generation process via contrastive loss, forcing the model to explicitly predict and align with the coarse cluster at each hierarchical step before emitting the next SID. This creates a verifiable, stepwise reasoning trajectory that guides computation and is absent in direct SID generation baselines. To address the request, we will add an ablation study comparing S²GR against a variant without thinking-token contrastive supervision, and include qualitative examples illustrating the semantic content of the learned tokens. revision: yes

Circularity Check

1 steps flagged

Contrastive supervision of thinking tokens reduces to re-fitting codebook cluster distributions by construction

specific steps
  1. fitted input called prediction [Abstract]
    "Our core innovation introduces the stepwise reasoning mechanism inserting thinking tokens before each SID generation step, where each token explicitly represents coarse-grained semantics supervised via contrastive learning against ground-truth codebook cluster distributions ensuring physically grounded reasoning paths and balanced computational focus across all SID codes."

    The ground-truth codebook cluster distributions originate from the codebook optimization described immediately prior in the abstract, which integrates item co-occurrence, load balancing, and uniformity on the same sequences. The contrastive supervision therefore forces alignment to already-fitted cluster assignments, rendering the 'physically grounded reasoning paths' equivalent to a re-fitting of the codebook outputs by construction rather than a new, independently verifiable reasoning step.

full rationale

The paper's core innovation claims to produce physically grounded reasoning paths by supervising thinking tokens via contrastive learning against ground-truth codebook cluster distributions. However, these distributions are generated by the paper's own preceding codebook optimization step, which fits to item co-occurrence, load balancing, and uniformity objectives on the identical interaction data. This makes the claimed reasoning mechanism dependent on quantities already shaped by the model's fitting process, reducing the 'stepwise semantic-guided reasoning' to an internal re-expression of the codebook's fitted semantics rather than an independent derivation. The central claim therefore exhibits partial circularity as a fitted-input-called-prediction pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on the effectiveness of contrastive learning for grounding tokens, the utility of co-occurrence plus load-balancing objectives for codebook quality, and the premise that hierarchical SID codes benefit from balanced per-step computation.

axioms (2)
  • domain assumption Contrastive learning against codebook cluster distributions produces physically grounded reasoning paths
    Invoked in the description of the core innovation to justify the supervision choice.
  • domain assumption Item co-occurrence relationships capture behavioral patterns suitable for codebook optimization
    Stated as part of establishing a robust semantic foundation.
invented entities (1)
  • thinking tokens no independent evidence
    purpose: Represent coarse-grained semantics before each SID generation step
    New component inserted into the generation process; no independent falsifiable prediction given in abstract.

pith-pipeline@v0.9.0 · 5800 in / 1376 out tokens · 41949 ms · 2026-05-21T14:18:56.300213+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 11 internal anchors

  1. [1]

    Keqin Bao, Jizhi Zhang, Wenjie Wang, Yang Zhang, Zhengyi Yang, Yanchen Luo, Chong Chen, Fuli Feng, and Qi Tian. 2025. A bi-step grounding paradigm for large language models in recommendation systems.ACM Transactions on Recommender Systems3, 4 (2025), 1–27

  2. [2]

    Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. 2023. Tallrec: An effective and efficient tuning framework to align large language model with recommendation. InProceedings of the 17th ACM conference on recommender systems. 1007–1014

  3. [3]

    Mahtab Bigverdi, Zelun Luo, Cheng-Yu Hsieh, Ethan Shen, Dongping Chen, Linda G Shapiro, and Ranjay Krishna. 2025. Perception tokens enhance visual reasoning in multimodal language models. InProceedings of the Computer Vision and Pattern Recognition Conference. 3836–3845

  4. [4]

    Xinghao Chen, Anhao Zhao, Heming Xia, Xuan Lu, Hanlin Wang, Yanjun Chen, Wei Zhang, Jian Wang, Wenjie Li, and Xiaoyu Shen. 2025. Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning. arXiv preprint arXiv:2505.16782(2025)

  5. [5]

    Yuxin Chen, Junfei Tan, An Zhang, Zhengyi Yang, Leheng Sheng, Enzhi Zhang, Xiang Wang, and Tat-Seng Chua. 2024. On softmax direct preference optimization for recommendation.Advances in Neural Information Processing Systems37 (2024), 27463–27489

  6. [6]

    Zhangquan Chen, Manyuan Zhang, Xinlei Yu, Xufang Luo, Mingze Sun, Zihao Pan, Yan Feng, Peng Pei, Xunliang Cai, and Ruqi Huang. 2025. Think with 3d: Geometric imagination grounded spatial reasoning from limited views.arXiv preprint arXiv:2510.18632(2025)

  7. [7]

    Damai Dai, Chengqi Deng, Chenggang Zhao, RX Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Yu Wu, et al. 2024. Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models.arXiv preprint arXiv:2401.06066(2024)

  8. [8]

    Sunhao Dai, Jiakai Tang, Jiahua Wu, Kun Wang, Yuxuan Zhu, Bingjun Chen, Bangyang Hong, Yu Zhao, Cong Fu, Kangle Wu, et al. 2025. OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System.arXiv preprint arXiv:2509.18091(2025)

  9. [9]

    Jiaxin Deng, Shiyao Wang, Kuo Cai, Lejian Ren, Qigen Hu, Weifeng Ding, Qiang Luo, and Guorui Zhou. 2025. Onerec: Unifying retrieve and rank with generative recommender and iterative preference alignment.arXiv preprint arXiv:2502.18965 (2025)

  10. [10]

    Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, and Yuandong Tian. 2024. Training large language models to reason in a continuous latent space.arXiv preprint arXiv:2412.06769(2024)

  11. [11]

    Ruining He, Lukasz Heldt, Lichan Hong, Raghunandan Keshavan, Shifan Mao, Nikhil Mehta, Zhengyang Su, Alicia Tsai, Yueqi Wang, Shao-Chuan Wang, et al

  12. [12]

    Plum: Adapting pre-trained language models for industrial-scale generative recommendations.arXiv preprint arXiv:2510.07784(2025)

  13. [13]

    Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. Inproceedings of the 25th international conference on world wide web. 507–517

  14. [14]

    Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk

  15. [15]

    Session-based recommendations with recurrent neural networks.arXiv preprint arXiv:1511.06939(2015)

  16. [16]

    Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recom- mendation. In2018 IEEE international conference on data mining (ICDM). IEEE, 197–206

  17. [17]

    Bangzheng Li, Ximeng Sun, Jiang Liu, Ze Wang, Jialian Wu, Xiaodong Yu, Hao Chen, Emad Barsoum, Muhao Chen, and Zicheng Liu. 2025. Latent visual rea- soning.arXiv preprint arXiv:2509.24251(2025)

  18. [18]

    Jindong Li, Yali Fu, Li Fan, Jiahong Liu, Yao Shu, Chengwei Qin, Menglin Yang, Irwin King, and Rex Ying. 2025. Implicit reasoning in large language models: A comprehensive survey.arXiv preprint arXiv:2509.02350(2025)

  19. [19]

    Kaiyuan Li, Rui Xiang, Yong Bai, Yongxiang Tang, Yanhua Cheng, Xialong Liu, Peng Jiang, and Kun Gai. 2025. Bbqrec: Behavior-bind quantization for multi- modal sequential recommendation.arXiv preprint arXiv:2504.06636(2025)

  20. [20]

    Xiaopeng Li, Bo Chen, Junda She, Shiteng Cao, You Wang, Qinlin Jia, Haiying He, Zheli Zhou, Zhao Liu, Ji Liu, et al. 2025. A Survey of Generative Recommendation from a Tri-Decoupled Perspective: Tokenization, Architecture, and Optimization. (2025)

  21. [21]

    Jiacheng Lin, Tian Wang, and Kun Qian. 2025. Rec-r1: Bridging generative large language models and user-centric recommendation systems via reinforcement learning.arXiv preprint arXiv:2503.24289(2025)

  22. [22]

    Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Cheng- gang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. 2024. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437(2024)

  23. [23]

    Enze Liu, Bowen Zheng, Xiaolei Wang, Wayne Xin Zhao, Jinpeng Wang, Sheng Chen, and Ji-Rong Wen. 2025. LARES: Latent Reasoning for Sequential Recom- mendation.arXiv preprint arXiv:2505.16865(2025)

  24. [24]

    Zhanyu Liu, Shiyao Wang, Xingmei Wang, Rongzhou Zhang, Jiaxin Deng, Honghui Bao, Jinghao Zhang, Wuchao Li, Pengfei Zheng, Xiangyu Wu, et al

  25. [25]

    Onerec-think: In-text reasoning for generative recommendation.arXiv preprint arXiv:2510.11639(2025)

  26. [26]

    Xinchen Luo, Jiangxia Cao, Tianyu Sun, Jinkai Yu, Rui Huang, Wei Yuan, Hezheng Lin, Yichen Zheng, Shiyao Wang, Qigen Hu, et al . 2025. Qarm: Quantitative alignment multi-modal recommendation at kuaishou. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 5915– 5922

  27. [27]

    Jianmo Ni, Gustavo Hernandez Abrego, Noah Constant, Ji Ma, Keith Hall, Daniel Cer, and Yinfei Yang. 2022. Sentence-t5: Scalable sentence encoders from pre- trained text-to-text models. InFindings of the association for computational lin- guistics: ACL 2022. 1864–1874

  28. [28]

    Qiyao Peng, Hongtao Liu, Hua Huang, Qing Yang, and Minglai Shao. 2025. A survey on llm-powered agents for recommender systems.arXiv preprint arXiv:2502.10050(2025)

  29. [29]

    Gustavo Penha, Edoardo D’Amico, Marco De Nadai, Enrico Palumbo, Alexandre Tamborrino, Ali Vardasbi, Max Lefarov, Shawn Lin, Timothy Heath, Francesco Fabbri, et al. 2025. Semantic ids for joint generative search and recommendation. InProceedings of the Nineteenth ACM Conference on Recommender Systems. 1296– 1301

  30. [30]

    Yiming Qin, Bomin Wei, Jiaxin Ge, Konstantinos Kallidromitis, Stephanie Fu, Trevor Darrell, and XuDong Wang. 2025. Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens.arXiv preprint arXiv:2511.19418(2025)

  31. [31]

    Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al

  32. [32]

    Recommender systems with generative retrieval.Advances in Neural Information Processing Systems36 (2023), 10299–10315

  33. [33]

    Teng Shi, Weicong Qin, Weijie Yu, Xiao Zhang, Ming He, Jianping Fan, and Jun Xu. 2025. Bridging Search and Recommendation through Latent Cross Reasoning. arXiv preprint arXiv:2508.04152(2025)

  34. [34]

    Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang

  35. [35]

    InProceedings of the 28th ACM international conference on information and knowledge management

    BERT4Rec: Sequential recommendation with bidirectional encoder rep- resentations from transformer. InProceedings of the 28th ACM international conference on information and knowledge management. 1441–1450

  36. [36]

    Jiakai Tang, Sunhao Dai, Teng Shi, Jun Xu, Xu Chen, Wen Chen, Jian Wu, and Yuning Jiang. 2025. Think before recommend: Unleashing the latent reasoning power for sequential recommendation.arXiv preprint arXiv:2503.22675(2025)

  37. [37]

    Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommenda- tion via convolutional sequence embedding. InProceedings of the eleventh ACM international conference on web search and data mining. 565–573

  38. [38]

    Lean Wang, Huazuo Gao, Chenggang Zhao, Xu Sun, and Damai Dai. 2024. Auxiliary-loss-free load balancing strategy for mixture-of-experts.arXiv preprint arXiv:2408.15664(2024)

  39. [39]

    Yuhao Yang, Zhi Ji, Zhaopeng Li, Yi Li, Zhonglin Mo, Yue Ding, Kai Chen, Zijian Zhang, Jie Li, Shuanglong Li, et al. 2025. Sparse meets dense: Unified generative recommendations with cascaded sparse-dense representations.arXiv preprint arXiv:2503.02453(2025)

  40. [40]

    Wencai Ye, Mingjie Sun, Shaoyun Shi, Peng Wang, Wenjin Wu, and Peng Jiang

  41. [41]

    InProceedings of the 34th ACM International Conference on Information and Knowledge Management

    DAS: Dual-Aligned Semantic IDs Empowered Industrial Recommender System. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 6217–6224

  42. [42]

    Guibin Zhang, Hejia Geng, Xiaohang Yu, Zhenfei Yin, Zaibin Zhang, Zelin Tan, Heng Zhou, Zhongzhi Li, Xiangyuan Xue, Yijiang Li, et al. 2025. The landscape of agentic reinforcement learning for llms: A survey.arXiv preprint arXiv:2509.02547 (2025)

  43. [43]

    Huanyu Zhang, Wenshan Wu, Chengzu Li, Ning Shang, Yan Xia, Yangyu Huang, Yifan Zhang, Li Dong, Zhang Zhang, Liang Wang, et al. 2025. Latent sketchpad: Sketching visual thoughts to elicit multimodal reasoning in mllms.arXiv preprint arXiv:2510.24514(2025)

  44. [44]

    Jun Zhang, Yi Li, Yue Liu, Changping Wang, Yuan Wang, Yuling Xiong, Xun Liu, Haiyang Wu, Qian Li, Enming Zhang, et al. 2025. GPR: Towards a Generative Pre-trained One-Model Paradigm for Large-Scale Advertising Recommendation. arXiv preprint arXiv:2511.10138(2025)

  45. [45]

    Junjie Zhang, Beichen Zhang, Wenqi Sun, Hongyu Lu, Wayne Xin Zhao, Yu Chen, and Ji-Rong Wen. 2025. Slow Thinking for Sequential Recommendation.arXiv preprint arXiv:2504.09627(2025)

  46. [46]

    Kaiyan Zhang, Yuxin Zuo, Bingxiang He, Youbang Sun, Runze Liu, Che Jiang, Yuchen Fan, Kai Tian, Guoli Jia, Pengfei Li, et al. 2025. A survey of reinforcement learning for large reasoning models.arXiv preprint arXiv:2509.08827(2025)

  47. [47]

    Wenlin Zhang, Chuhan Wu, Xiangyang Li, Yuhao Wang, Kuicai Dong, Yichao Wang, Xinyi Dai, Xiangyu Zhao, Huifeng Guo, and Ruiming Tang. 2025. Llmtreerec: Unleashing the power of large language models for cold-start recom- mendations. InProceedings of the 31st International Conference on Computational Linguistics. 886–896

  48. [48]

    Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou

  49. [49]

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models.arXiv preprint arXiv:2506.05176(2025). S2GR: Stepwise Semantic-Guided Reasoning in Latent Space for Generative Recommendation Conference acronym ’XX, June 03–05, 2018, Woodstock, NY

  50. [50]

    Yu Zhang, Shutong Qiao, Jiaqi Zhang, Tzu-Heng Lin, Chen Gao, and Yong Li

  51. [51]

    A survey of large language model empowered agents for recommenda- tion and search: Towards next-generation information retrieval.arXiv preprint arXiv:2503.05659(2025)

  52. [52]

    Guorui Zhou, Jiaxin Deng, Jinghao Zhang, Kuo Cai, Lejian Ren, Qiang Luo, Qian- qian Wang, Qigen Hu, Rui Huang, Shiyao Wang, et al. 2025. OneRec Technical Report.arXiv preprint arXiv:2506.13695(2025)

  53. [53]

    Guorui Zhou, Hengrui Hu, Hongtao Cheng, Huanjie Wang, Jiaxin Deng, Jinghao Zhang, Kuo Cai, Lejian Ren, Lu Ren, Liao Yu, et al. 2025. Onerec-v2 technical report.arXiv preprint arXiv:2508.20900(2025). Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Zihao Guo∗, Jian Wang∗, Ruxin Zhou, Youhua Liu, Jiawei Guo, Jun Zhao, Xiaoxiao Xu, Yongqi Liu, and Kai...