pith. sign in

arxiv: 2605.17779 · v1 · pith:7PL5CZRKnew · submitted 2026-05-18 · 💻 cs.LG

Learning Variable-Length Tokenization for Generative Recommendation

Pith reviewed 2026-05-20 11:57 UTC · model grok-4.3

classification 💻 cs.LG
keywords generative recommendationvariable-length tokenizationsemantic identifierspopularity-length paradoxhyperbolic quantizationinformation budget allocation
0
0 comments X

The pith

Generative recommendation improves when popular items receive short semantic IDs and rare items receive long ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that fixed-length tokenization wastes capacity because popular items already have strong collaborative signals and perform best with brief codes, while tail items need longer codes to encode distinctive content features amid sparse data. It introduces VarLenRec, which learns variable lengths through an information budget that shrinks as popularity grows. The authors prove that optimal ID length scales as a negative power of popularity and solve the implementation problems with hyperbolic space for capacity and a differentiable controller for length choice. This produces higher accuracy and faster training and inference than fixed-length baselines on four datasets.

Core claim

The central claim is that optimal semantic ID length scales as a negative power of item popularity, as proven by Popularity-Weighted Information Budget Allocation. Variable-length tokenization is realized by Hyperbolic Residual Quantization in the Poincaré ball, which supplies growing volume to support different code lengths without distortion, and by a Soft Length Controller that predicts lengths differentiably via retention probabilities regularized by the PIBA priors. Experiments confirm that the resulting VarLenRec model outperforms state-of-the-art fixed-length generative recommenders in accuracy while also reducing training and inference cost.

What carries the argument

Popularity-Weighted Information Budget Allocation (PIBA) that derives optimal ID lengths scaling as a negative power of popularity, paired with Hyperbolic Residual Quantization that exploits the Poincaré ball's exponential volume growth to stratify encoding capacity across lengths.

Load-bearing premise

Hyperbolic geometry supplies enough undistorted capacity to represent items at many different code lengths, whereas Euclidean space does not.

What would settle it

Measure accuracy on the same four datasets after replacing the learned variable lengths with a single fixed length equal to the average; if the fixed-length version matches or exceeds VarLenRec, the claimed benefit of popularity-dependent allocation is refuted.

Figures

Figures reproduced from arXiv: 2605.17779 by Bowen Wu, Minhao Wang, Wei Zhang.

Figure 1
Figure 1. Figure 1: The Popularity-Length Paradox. NDCG@10 on dif [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall architecture of VarLenRec. 3.2.1 Problem Formulation. Consider an item catalog I = {𝑖1,𝑖2, . . . , 𝑖𝑁 } where each item 𝑖 has content features x𝑖 ∈ R 𝐹 and popular￾ity 𝑝𝑖 (normalized interaction frequency, Í𝑁 𝑖=1 𝑝𝑖 = 1). Our goal is to tokenize each item a semantic ID z𝑖 = (𝑧 (1) 𝑖 , . . . , 𝑧 (𝐿𝑖 ) 𝑖 ) through residual quantization, where 𝑧 (𝑙) 𝑖 ∈ {𝑐 (1) 1 , . . . , 𝑐 (1) 𝑀 } indexes a code￾book… view at source ↗
Figure 3
Figure 3. Figure 3: Hyperparameter sensitivity analysis. For items with sufficient popularity such that 𝜃𝑝𝑖 ≫ 1, we approxi￾mate log(1 + 𝜃𝑝𝑖) ≈ log 𝜃 + log 𝑝𝑖 : 𝐿 ∗ 𝑖 = exp  𝐼req − 𝛼 log 𝜃 − 𝛼 log 𝑝𝑖 𝛾  (26) = exp  𝐼req − 𝛼 log 𝜃 𝛾  · 𝑝 −𝛼/𝛾 𝑖 . (27) Defining 𝐶 = exp (𝐼req − 𝛼 log 𝜃)/𝛾  > 0, we obtain: 𝐿 ∗ 𝑖 = 𝐶 · 𝑝 −𝛼/𝛾 𝑖 . (28) Since 𝛼,𝛾 > 0, the exponent −𝛼/𝛾 < 0, confirming that optimal length decreases with populari… view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of learned semantic ID lengths by Var [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Performance scaling analysis by item popularity [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
read the original abstract

Generative recommendation reformulates recommendation as next-token prediction over discrete semantic identifiers (IDs). A fundamental yet unexplored design choice is that existing methods employ fixed-length tokenization for all items, implicitly assuming uniform encoding capacity regardless of item characteristics. Through systematic experiments across four datasets, we discover the Popularity-Length Paradox: popular items achieve optimal performance with short IDs, while tail items require substantially longer codes to capture discriminative semantics. This reveals a critical mismatch where popular items benefit from abundant collaborative signals and require minimal semantic detail, whereas tail items must rely on fine-grained content features due to sparse interaction data. To address this, we propose VarLenRec, a framework for learning variable-length tokenization. We develop Popularity-Weighted Information Budget Allocation (PIBA), an information-theoretic framework proving that optimal ID length should scale as a negative power of popularity. Directly implementing variable-length allocation faces two technical challenges: standard Euclidean residual quantization lacks geometric capacity to support diverse code lengths without distortion, and discrete length decisions are non-differentiable. We address these through Hyperbolic Residual Quantization, which leverages the exponential volume growth of the Poincar\'e ball to naturally stratify encoding capacity, and a Soft Length Controller, which enables differentiable length prediction via continuous layer retention probabilities regularized by PIBA-derived priors. Extensive experiments demonstrate that VarLenRec achieves significant improvements over state-of-the-art methods in recommendation accuracy and training/inference efficiency, revealing the importance of adaptive encoding capacity in generative recommendation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that fixed-length semantic IDs in generative recommendation are suboptimal due to the Popularity-Length Paradox: popular items achieve best performance with short IDs while tail items require longer codes to capture discriminative semantics. It introduces Popularity-Weighted Information Budget Allocation (PIBA) as an information-theoretic framework that proves optimal ID length scales as a negative power of popularity. To realize variable-length tokenization, the authors propose Hyperbolic Residual Quantization (leveraging Poincaré ball geometry) and a Soft Length Controller (using continuous retention probabilities regularized by PIBA priors). Systematic experiments on four datasets are reported to show gains in recommendation accuracy and training/inference efficiency over prior generative methods.

Significance. If the PIBA derivation is shown to be independent of the training data and the hyperbolic quantization is demonstrated to avoid distortion for variable lengths, the work would offer a principled, adaptive encoding strategy that addresses a clear mismatch between item popularity and required semantic capacity. This could meaningfully influence efficient generative recommender design. The reported experiments across multiple datasets provide a starting point for empirical validation, though the absence of full tables, error bars, and derivation details limits immediate assessment of robustness.

major comments (2)
  1. [§3 (PIBA framework)] §3 (PIBA framework): the claim that PIBA constitutes a proof that optimal length l* scales as a negative power of popularity is load-bearing for the central contribution. The derivation appears to depend on specific functional forms relating popularity to mutual information and on the choice of regularization priors; it is not shown whether these choices are independent of the empirical popularity statistics used in training or whether they reduce to a data-dependent allocation. A concrete walk-through of the steps from the weighted budget and entropy terms to l* ∝ pop^{-α} (including any continuous approximations) is needed to confirm the result is not circular.
  2. [§4 (Hyperbolic Residual Quantization)] §4 (Hyperbolic Residual Quantization): the assertion that the exponential volume growth of the Poincaré ball naturally supports diverse code lengths without distortion is central to solving the geometric-capacity challenge. However, no quantitative comparison (e.g., reconstruction error or embedding distortion metrics) is provided against Euclidean residual quantization under the same variable-length regime, leaving open whether the claimed capacity advantage is realized in practice for the sparse, discrete item sets typical in recommendation data.
minor comments (2)
  1. [Abstract / §5] The abstract and introduction refer to 'four datasets' without naming them or providing basic statistics (e.g., number of items, interaction density); this information should appear in §5 or an appendix for reproducibility.
  2. [§4.2] Notation for the Soft Length Controller (continuous layer retention probabilities) is introduced without an explicit equation linking the regularization term to the PIBA-derived prior; adding this would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the PIBA derivation and the geometric properties of hyperbolic quantization. We address the two major comments point by point below.

read point-by-point responses
  1. Referee: §3 (PIBA framework): the claim that PIBA constitutes a proof that optimal length l* scales as a negative power of popularity is load-bearing for the central contribution. The derivation appears to depend on specific functional forms relating popularity to mutual information and on the choice of regularization priors; it is not shown whether these choices are independent of the empirical popularity statistics used in training or whether they reduce to a data-dependent allocation. A concrete walk-through of the steps from the weighted budget and entropy terms to l* ∝ pop^{-α} (including any continuous approximations) is needed to confirm the result is not circular.

    Authors: We thank the referee for this observation. In the revised manuscript we will insert a full step-by-step derivation (new Appendix B) that begins with the popularity-weighted information budget, proceeds through the entropy-regularized objective and the continuous relaxation of length selection, and arrives at the scaling l* ∝ pop^{-α} under the stated power-law assumption between popularity and mutual information. The functional forms are chosen from standard information-theoretic modeling assumptions rather than fitted to any particular dataset; once the scaling is obtained analytically, empirical popularity values are used only to instantiate the per-item budgets. We will explicitly note this separation to remove any appearance of circularity. revision: yes

  2. Referee: §4 (Hyperbolic Residual Quantization): the assertion that the exponential volume growth of the Poincaré ball naturally supports diverse code lengths without distortion is central to solving the geometric-capacity challenge. However, no quantitative comparison (e.g., reconstruction error or embedding distortion metrics) is provided against Euclidean residual quantization under the same variable-length regime, leaving open whether the claimed capacity advantage is realized in practice for the sparse, discrete item sets typical in recommendation data.

    Authors: We agree that a direct quantitative comparison is valuable. In the revision we will add a new subsection (and corresponding table) that reports reconstruction MSE and embedding distortion (cosine and Euclidean) for both Hyperbolic Residual Quantization and a Euclidean residual baseline, each run under identical variable-length schedules on all four datasets. These results will be used to substantiate the claimed capacity advantage for sparse item sets. revision: yes

Circularity Check

0 steps flagged

No significant circularity in PIBA derivation or VarLenRec framework

full rationale

The paper presents PIBA as an information-theoretic framework that derives optimal ID length scaling as a negative power of popularity from the popularity-length paradox observed in experiments. The abstract and description frame this as a proof emerging from weighted budget allocation, entropy terms, and regularization priors applied to item popularity statistics as inputs. No equations or sections are available in the provided text that demonstrate the scaling law reducing by construction to a fitted parameter, a self-citation chain, or an unverified ansatz smuggled from prior work. The Poincaré ball usage is introduced to solve a stated geometric capacity challenge rather than presupposing the result. The central claim therefore retains independent theoretical content separate from the data used for allocation, qualifying as a self-contained derivation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim rests on an information-theoretic derivation of length scaling plus two newly introduced technical mechanisms whose independent grounding is not provided in the abstract.

free parameters (1)
  • scaling exponent in PIBA
    The negative power relating popularity to optimal ID length; whether this is derived parameter-free or fitted is not specified in the abstract.
axioms (1)
  • domain assumption Optimal semantic ID length for an item scales as a negative power of its popularity under information-theoretic budget allocation
    Invoked as the foundation of PIBA to justify variable-length allocation.
invented entities (2)
  • Hyperbolic Residual Quantization no independent evidence
    purpose: To leverage exponential volume growth in the Poincaré ball for supporting diverse code lengths without geometric distortion
    New quantization method introduced to overcome limitations of Euclidean residual quantization.
  • Soft Length Controller no independent evidence
    purpose: To enable end-to-end differentiable training of discrete length decisions via continuous retention probabilities
    New optimization component introduced to handle non-differentiability of length selection.

pith-pipeline@v0.9.0 · 5789 in / 1597 out tokens · 59109 ms · 2026-05-20T11:57:02.457823+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 7 internal anchors

  1. [1]

    2006.Hyperbolic geometry

    James W Anderson. 2006.Hyperbolic geometry. Springer Science & Business Media

  2. [2]

    Marko Balabanovic and Yoav Shoham. 1997. Fab: content-based, collaborative recommendation.Commun. ACM40 (1997), 66–72. https://api.semanticscholar. org/CorpusID:15277800

  3. [3]

    Gary Bécigneul and Octavian-Eugen Ganea. 2018. Riemannian Adaptive Opti- mization Methods.ArXivabs/1810.00760 (2018). https://api.semanticscholar.org/ CorpusID:52898806

  4. [4]

    Michele Bevilacqua, Giuseppe Ottaviano, Patrick Lewis, Wen tau Yih, Sebastian Riedel, and Fabio Petroni. 2022. Autoregressive Search Engines: Generating Substrings as Document Identifiers.ArXivabs/2204.10628 (2022). https://api. semanticscholar.org/CorpusID:248366293

  5. [5]

    Jean Bourgain. 1985. On lipschitz embedding of finite metric spaces in Hilbert space.Israel Journal of Mathematics52 (1985), 46–52. https://api.semanticscholar. org/CorpusID:121649019

  6. [6]

    R. Burke. 2002. Hybrid Recommender Systems: Survey and Experiments. User Modeling and User-Adapted Interaction12 (2002), 331–370. https://api. semanticscholar.org/CorpusID:3970

  7. [7]

    Qiwei Chen, Huan Zhao, Wei Li, Pipei Huang, and Wenwu Ou. 2019. Behavior sequence transformer for e-commerce recommendation in Alibaba.Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data(2019). https://api.semanticscholar.org/CorpusID:155099871

  8. [8]

    Conway, N

    John H. Conway, N. J. A. Sloane, and Eiichi Bannai. 1987. Sphere Packings, Lattices and Groups. InGrundlehren der mathematischen Wissenschaften. https: //api.semanticscholar.org/CorpusID:119594825

  9. [9]

    Adams, and Emre Sargin

    Paul Covington, Jay K. Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations.Proceedings of the 10th ACM Conference on Rec- ommender Systems(2016). https://api.semanticscholar.org/CorpusID:207240067

  10. [11]

    Yashar Deldjoo, Zhankui He, Julian McAuley, Anton Korikov, Scott Sanner, Arnau Ramisa, René Vidal, Maheswaran Sathiamoorthy, Atoosa Kasirzadeh, and Silvia Milano. 2024. A Review of Modern Recommender Systems Using Generative Mod- els (Gen-RecSys).Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining(2024). https://api.semant...

  11. [12]

    Jiaxin Deng, Shiyao Wang, Kuo Cai, Lejian Ren, Qigen Hu, Weifeng Ding, Qiang Luo, and Guorui Zhou. 2025. OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment.ArXivabs/2502.18965 (2025). https://api.semanticscholar.org/CorpusID:277942156

  12. [13]

    Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang. 2022. Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5).Proceedings of the 16th ACM Conference on Rec- ommender Systems(2022). https://api.semanticscholar.org/CorpusID:247749019

  13. [14]

    1994.Concrete mathematics: a foundation for computer science

    Ronald L Graham. 1994.Concrete mathematics: a foundation for computer science. Pearson Education India

  14. [15]

    Gray and David L

    Robert M. Gray and David L. Neuhoff. 1998. Quantization.IEEE Trans. Inf. Theory 44 (1998), 2325–2383. https://api.semanticscholar.org/CorpusID:260498687

  15. [16]

    Ruining He and Julian McAuley. 2016. Fusing Similarity Models with Markov Chains for Sparse Sequential Recommendation.2016 IEEE 16th International Conference on Data Mining (ICDM)(2016), 191–200. https://api.semanticscholar. org/CorpusID:9124261

  16. [17]

    Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk

  17. [18]

    Session-based Recommendations with Recurrent Neural Networks

    Session-based Recommendations with Recurrent Neural Networks.CoRR abs/1511.06939 (2015). https://api.semanticscholar.org/CorpusID:260446846

  18. [19]

    Junda Hu, Wei Xia, Xiangyang Zhang, Chengyuan Fu, Weilai Wu, and Zihan Huan. 2024. Enhancing Sequential Recommendation via LLM-based Semantic Embedding Learning.ArXivabs/2404.08304 (2024). https://api.semanticscholar. org/CorpusID:268956234

  19. [20]

    Wenyue Hua, Shuyuan Xu, Yingqiang Ge, and Yongfeng Zhang. 2023. How to index item IDs for recommendation foundation models.Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region(2023). https://api.semanticscholar. org/CorpusID:258615345

  20. [21]

    Yanhua Huang, Yuqi Chen, Xiong Cao, Rui Yang, Mingliang Qi, Yinghao Zhu, Qingchang Han, Yaowei Liu, Zhaoyu Liu, and Xuefeng Yao. 2025. Towards Large-Scale Generative Ranking.ArXivabs/2501.08234 (2025). https://api. semanticscholar.org/CorpusID:277401923

  21. [22]

    Jian Jia, Yipei Wang, Yan Li, Hongyu Lu, Xuehan Bai, Zichen Wang, Jie Jiang, Guoyu Tang, and Peng Jiang. 2025. LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application.ArXiv abs/2410.23490 (2025). https://api.semanticscholar.org/CorpusID:273532845

  22. [23]

    Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Recom- mendation.2018 IEEE International Conference on Data Mining (ICDM)(2018), 197–206. https://api.semanticscholar.org/CorpusID:52127932

  23. [24]

    Bell, and Chris Volinsky

    Yehuda Koren, Robert M. Bell, and Chris Volinsky. 2009. Matrix Factoriza- tion Techniques for Recommender Systems.Computer42 (2009). https: //api.semanticscholar.org/CorpusID:58370896

  24. [25]

    Andreas Krause, Ajit Paul Singh, and Carlos Guestrin. 2008. Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies.J. Mach. Learn. Res.9 (2008), 235–284. https://api.semanticscholar.org/ CorpusID:224110813

  25. [26]

    Krioukov, Fragkiskos Papadopoulos, Maksim Kitsak, Amin Vahdat, and Marián Boguñá

    Dmitri V. Krioukov, Fragkiskos Papadopoulos, Maksim Kitsak, Amin Vahdat, and Marián Boguñá. 2010. Hyperbolic Geometry of Complex Networks.Physical review. E, Statistical, nonlinear, and soft matter physics82 3 Pt 2 (2010), 036106. https://api.semanticscholar.org/CorpusID:6451908

  26. [27]

    Doyup Lee, Chiheon Kim, Saehoon Kim, Minsu Cho, and Wook-Shin Han. 2022. Autoregressive Image Generation using Residual Quantization.2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2022), 11513– 11522. https://api.semanticscholar.org/CorpusID:247244535

  27. [28]

    Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural Attentive Session-based Recommendation.Proceedings of the 2017 ACM on Conference on Information and Knowledge Management(2017). https://api. semanticscholar.org/CorpusID:21066930

  28. [29]

    Chi Liu, Jiangxia Cao, Rui Huang, Kai Zheng, Qiang Luo, Kun Gai, and Guorui Zhou. 2024. KuaiFormer: Transformer-Based Retrieval at Kuaishou.arXiv preprint arXiv:2411.10057(2024)

  29. [30]

    Enze Liu, Bowen Zheng, Cheng Ling, Lantao Hu, Han Li, and Wayne Xin Zhao

  30. [31]

    Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval(2024)

    Generative Recommender with End-to-End Learnable Item Tokenization. Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval(2024). https://api.semanticscholar.org/ CorpusID:272524104

  31. [32]

    Han Liu, Yin wei Wei, Xuemeng Song, Weili Guan, Yuanfang Li, and Liqiang Nie. 2024. MMGRec: Multimodal Generative Recommendation with Transformer Model.ArXivabs/2404.16555 (2024). https://api.semanticscholar.org/CorpusID: 269362930

  32. [33]

    Chen Ma, Peng Kang, and Xue Liu. 2019. Hierarchical Gating Networks for Sequential Recommendation.Proceedings of the 25th ACM SIGKDD Inter- national Conference on Knowledge Discovery & Data Mining(2019). https: //api.semanticscholar.org/CorpusID:195316714

  33. [34]

    Kenton Murray and David Chiang. 2018. Correcting Length Bias in Neural Machine Translation.ArXivabs/1808.10006 (2018). https://api.semanticscholar. org/CorpusID:52132833

  34. [35]

    Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. InConference on Empirical Methods in Natural Language Processing. https://api.semanticscholar. org/CorpusID:202621357

  35. [36]

    Maximilian Nickel and Douwe Kiela. 2017. Poincaré embeddings for learning hierarchical representations. InProceedings of the 31st International Conference on Neural Information Processing Systems(Long Beach, California, USA)(NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6341–6350

  36. [37]

    Lukose, Mar- tin Scholz, and Qiang Yang

    Rong Pan, Yunhong Zhou, Bin Cao, Nathan Nan Liu, Rajan M. Lukose, Mar- tin Scholz, and Qiang Yang. 2008. One-Class Collaborative Filtering.2008 Eighth IEEE International Conference on Data Mining(2008), 502–511. https: //api.semanticscholar.org/CorpusID:7369746

  37. [38]

    Ming Pang, Chunyuan Yuan, Xiaoyu He, Zheng Fang, Donghao Xie, Fanyi Qu, Xue Jiang, Changping Peng, Zhangang Lin, Zheng Luo, et al. 2025. Generative Retrieval and Alignment Model: A New Paradigm for E-commerce Retrieval. In Companion Proceedings of the ACM on Web Conference 2025. 413–421. Conference’17, July 2017, Washington, DC, USA Minhao Wang, Bowen Wu,...

  38. [39]

    Aleksandr Vladimirovich Petrov and Craig Macdonald. 2023. Generative Se- quential Recommendation with GPTRec.ArXivabs/2306.11114 (2023). https: //api.semanticscholar.org/CorpusID:259203027

  39. [40]

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.J. Mach. Learn. Res. 21 (2019), 140:1–140:67. https://api.semanticscholar.org/CorpusID:204838007

  40. [41]

    Keshavan, Trung Hieu Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q

    Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan H. Keshavan, Trung Hieu Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q. Tran, Jonah Samost, Maciej Kula, Ed H. Chi, and Maheswaran Sathiamoorthy. 2023. Rec- ommender Systems with Generative Retrieval.ArXivabs/2305.05065 (2023). https://api.semanticscholar.org/CorpusID:258564854

  41. [42]

    Lam, Sean M

    Al Mamunur Rashid, I Edwin Albert, Dan Cosley, Shyong K. Lam, Sean M. McNee, Joseph A. Konstan, and John Riedl. 2002. Getting to know you: learning new user preferences in recommender systems. InInternational Conference on Intelligent User Interfaces. https://api.semanticscholar.org/CorpusID:13324

  42. [43]

    Ratcliffe

    John G. Ratcliffe. 2019. Foundations of Hyperbolic Manifolds.Graduate Texts in Mathematics(2019). https://api.semanticscholar.org/CorpusID:123040867

  43. [44]

    Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factor- izing personalized Markov chains for next-basket recommendation. InThe Web Conference. https://api.semanticscholar.org/CorpusID:207178809

  44. [45]

    Rik Sarkar. 2011. Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane. InInternational Symposium Graph Drawing and Network Visualization. https://api.semanticscholar.org/CorpusID:18268637

  45. [46]

    Zihua Si, Zhongxiang Sun, Jiale Chen, Guozhang Chen, Xiaoxue Zang, Kai Zheng, Yang Song, Xiao Zhang, Jun Xu, and Kun Gai. 2024. Generative Retrieval with Semantic Tree-Structured Identifiers and Contrastive Learning. InProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific...

  46. [47]

    Brent Smith and Greg Linden. 2017. Two decades of recommender systems at Amazon. com.Ieee internet computing21, 3 (2017), 12–18

  47. [48]

    Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential Recommendation with Bidirectional En- coder Representations from Transformer.Proceedings of the 28th ACM Inter- national Conference on Information and Knowledge Management(2019). https: //api.semanticscholar.org/CorpusID:119181611

  48. [49]

    Juntao Tan, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Zelong Li, and Yongfeng Zhang. 2024. IDGenRec: LLM-RecSys Alignment with Textual ID Learning. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(2024). https://api.semanticscholar.org/ CorpusID:268732697

  49. [50]

    Jiaxi Tang and Ke Wang. 2018. Personalized Top-N Sequential Recommen- dation via Convolutional Sequence Embedding.Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining(2018). https: //api.semanticscholar.org/CorpusID:39847715

  50. [51]

    Aäron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. 2017. Neural Discrete Representation Learning. InNeural Information Processing Systems. https: //api.semanticscholar.org/CorpusID:20282961

  51. [52]

    Jinpeng Wang, Ziyun Zeng, Yunxiao Wang, Yuting Wang, Xingyu Lu, Tianxiang Li, Jun Yuan, Rui Zhang, Haitao Zheng, and Shutao Xia. 2023. MISSRec: Pre- training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation.Proceedings of the 31st ACM International Conference on Multimedia(2023). https://api.semanticscholar.org/Corpus...

  52. [53]

    Qi Wang, Jindong Li, Shiqi Wang, Qianli Xing, Runliang Niu, He Kong, Rui Li, Guodong Long, Yi Chang, and Chengqi Zhang. 2024. Towards Next-Generation LLM-based Recommender Systems: A Survey and Beyond.ArXivabs/2410.19744 (2024). https://api.semanticscholar.org/CorpusID:273421890

  53. [54]

    Wenjie Wang, Honghui Bao, Xinyu Lin, Jizhi Zhang, Yongqi Li, Fuli Feng, See- Kiong Ng, and Tat-Seng Chua. 2024. Learnable Item Tokenization for Generative Recommendation.Proceedings of the 33rd ACM International Conference on Information and Knowledge Management(2024). https://api.semanticscholar.org/ CorpusID:269757237

  54. [55]

    Yejin Wang, Jiahao Xun, Mingjie Hong, Jieming Zhu, Tao Jin, Wang Lin, Haoyuan Li, Linjun Li, Yan Xia, Zhou Zhao, and Zhenhua Dong. 2024. EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration.Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2024). https://api.semanticscholar.org/CorpusID:270620730

  55. [56]

    Likang Wu, Zhilan Zheng, Zhaopeng Qiu, Hao Wang, Hongchao Gu, Tingjia Shen, Chuan Qin, Chen Zhu, Hengshu Zhu, Qi Liu, Hui Xiong, and Enhong Chen

  56. [57]

    https://api.semanticscholar.org/CorpusID:258987581

    A survey on large language models for recommendation.World Wide Web 27 (2023). https://api.semanticscholar.org/CorpusID:258987581

  57. [58]

    Yuhao Yang, Zhi Ji, Zhaopeng Li, Yi Li, Zhonglin Mo, Yue Ding, Kai Chen, Zijian Zhang, Jie Li, Shuanglong Li, et al. 2025. Sparse meets dense: Unified generative recommendations with cascaded sparse-dense representations.arXiv preprint arXiv:2503.02453(2025)

  58. [59]

    Hongzhi Yin, Bin Cui, Jing Li, Junjie Yao, and Cheng Chen. 2012. Challeng- ing the Long Tail Recommendation.ArXivabs/1205.6700 (2012). https: //api.semanticscholar.org/CorpusID:7748683

  59. [60]

    Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, and Marco Tagliasacchi. 2021. SoundStream: An End-to-End Neural Audio Codec.IEEE/ACM Transactions on Audio, Speech, and Language Processing30 (2021), 495–507. https: //api.semanticscholar.org/CorpusID:236149944

  60. [61]

    Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhaojie Gong, Fangda Gu, Michael He, Yin-Hua Lu, and Yu Shi. 2024. Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Genera- tive Recommendations.ArXivabs/2402.17152 (2024). https://api.semanticscholar. org/CorpusID:268033327

  61. [62]

    Jianyang Zhai, Zi-Feng Mai, Chang-Dong Wang, Feidiao Yang, Xiawu Zheng, Hui Li, and Yonghong Tian. 2025. Multimodal Quantitative Language for Generative Recommendation.ArXivabs/2504.05314 (2025). https://api.semanticscholar.org/ CorpusID:277626954

  62. [63]

    Fuwei Zhang, Xiaoyu Liu, Dongbo Xi, Jishen Yin, Huan Chen, Peng Yan, Fuzhen Zhuang, and Zhao Zhang. 2025. Multi-Aspect Cross-modal Quanti- zation for Generative Recommendation.ArXivabs/2511.15122 (2025). https: //api.semanticscholar.org/CorpusID:283103631

  63. [64]

    Sheng, Jiajie Xu, De- qing Wang, Guanfeng Liu, and Xiaofang Zhou

    Tingting Zhang, Pengpeng Zhao, Yanchi Liu, Victor S. Sheng, Jiajie Xu, De- qing Wang, Guanfeng Liu, and Xiaofang Zhou. 2019. Feature-level Deeper Self-Attention Network for Sequential Recommendation. InInternational Joint Conference on Artificial Intelligence. https://api.semanticscholar.org/CorpusID: 199465766

  64. [65]

    Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, and Ji rong Wen. 2023. Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation.2024 IEEE 40th International Conference on Data Engineering (ICDE)(2023), 1435–1448. https://api.semanticscholar.org/CorpusID: 265213194

  65. [66]

    Kun Zhou, Haibo Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, and Ji rong Wen. 2020. S3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization.Proceedings of the 29th ACM International Conference on Information & Knowledge Management (2020). https://api.semanticscholar.org/CorpusID:221150341

  66. [67]

    Kun Zhou, Hui Yu, Wayne Xin Zhao, and Ji rong Wen. 2022. Filter-enhanced MLP is All You Need for Sequential Recommendation.Proceedings of the ACM Web Conference 2022(2022). https://api.semanticscholar.org/CorpusID:247158344 A Proof of Theorem 1 Proof. Under the Information Budget framework, the semantic ID must fill the information gap𝐺 𝑖, requiring: 𝐼sem...