How Reliable Are Semantic-ID Tokenizer Comparisons in Generative Recommendation?

Haibo Zhang; Jeremiah D. Deng; Lech Szymanski; Qian Zhang

arxiv: 2605.25330 · v1 · pith:VWY6WYXOnew · submitted 2026-05-25 · 💻 cs.IR

How Reliable Are Semantic-ID Tokenizer Comparisons in Generative Recommendation?

Qian Zhang , Lech Szymanski , Haibo Zhang , Jeremiah D. Deng This is my paper

Pith reviewed 2026-06-29 21:02 UTC · model grok-4.3

classification 💻 cs.IR

keywords semantic-idgenerative recommendationtokenizer collisionsevaluation metricsitem-level performancediscrete codesautoregressive modelscode assignment

0 comments

The pith

Semantic-ID tokenizers often assign identical code sequences to multiple items, so standard hit rates count group matches as successes and inflate performance by up to 103 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that Semantic-ID tokenizers compress item features into discrete code sequences for autoregressive generation, but this compression routinely produces duplicate sequences for semantically similar yet distinct items. Across four datasets and five tokenizers, up to 30.5 percent of items participate in such collisions, meaning a generated SID sequence can match any member of a collision group rather than the intended target item. Consequently, conventional SID-level metrics such as Hit@10 systematically overestimate true item-level accuracy. The authors supply collision-aware item-level metrics computed directly from the generated sequences and a post-tokenizer reassignment step that eliminates collisions at minimum cost while preserving the existing code hierarchy. These results imply that tokenizer comparisons reported in earlier generative recommendation studies require reinterpretation.

Core claim

Because tokenizers compress item features into a code space, semantically similar but collaboratively distinct items are frequently assigned the same SID sequence; across four datasets and five representative tokenizers the fraction of items involved in collisions reaches 30.5 percent, so SID-level matching identifies only a collision group rather than the target item and inflates Hit@10 by up to 103.36 percent.

What carries the argument

SID collision groups, where multiple items share an identical code sequence produced by the tokenizer, which the paper measures directly and corrects via post-tokenizer last-level reassignment.

If this is right

SID-level rankings of tokenizers reported in prior work must be treated as upper bounds on item-level performance.
The degree of metric inflation scales directly with the measured collision rate.
Any generative recommender using SID generation requires either explicit collision correction or a collision-free code assignment to produce trustworthy item-level scores.
The proposed minimum-cost reassignment produces a collision-free SID space for any existing tokenizer without retraining the tokenizer itself.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same collision phenomenon could appear in any autoregressive model that decodes discrete codes to real-world entities, not only recommendation.
Tokenizer designers may need to optimize jointly for semantic fidelity and uniqueness of the final code sequences.
If the reassignment step alters downstream generation quality, an explicit trade-off study between collision rate and semantic coherence would be needed.

Load-bearing premise

The four chosen datasets and five tokenizers are representative enough that the observed collision rates and metric inflation generalize to other recommendation settings and tokenizers.

What would settle it

Compute both SID-level and true item-level Hit@10 on a held-out test set after applying the collision-aware metric; if the gap between the two metrics is near zero on every dataset, the claimed inflation does not hold.

Figures

Figures reproduced from arXiv: 2605.25330 by Haibo Zhang, Jeremiah D. Deng, Lech Szymanski, Qian Zhang.

**Figure 2.** Figure 2: Collision-corrected evaluation. The expanded item [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Case study of reassignment within one Beauty/RK [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Effect of collaborative signal under zero-collision [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: 𝑡-SNE visualization of item embeddings on Cell and Yelp, under Qwen3 textual embeddings and PPMI+SVD collaborative embeddings. Different colors of data points indicate distinct highlighted SID groups, and light grey points belong to other collision groups. 7 Conclusion We present a faithful evaluation framework that comprises collisionaware metrics (CCE) and a zero-collision reassignment method (ZCR). CCE… view at source ↗

read the original abstract

In Semantic-ID (SID) based generative recommendation, each item is represented as a sequence of discrete codes, and an autoregressive model is trained to generate the SID sequence of the next item; top-K performance is then measured by checking whether the SID sequence of the target item appears among the generated sequences. This evaluation protocol equates SID-level matching with item-level recommendation, an equivalence that holds only when every SID sequence maps to a single item. We show this assumption breaks down in practice: because tokenizers compress item features into a code space, semantically similar but collaboratively distinct items are frequently assigned the same SID sequence. Across four datasets and five representative tokenizers, the fraction of items involved in such collisions reaches 30.5%, so matching a shared SID sequence identifies only a collision group rather than the target item. Consequently, SID-level metrics overestimate item-level performance (Hit@10 is inflated by up to 103.36%), and the inflation grows with the collision rate. To support faithful comparison, we develop collision-aware item-level metrics computed directly from generated SID sequences, together with a post-tokenizer procedure that reassigns last-level SIDs at minimum cost to obtain a collision-free assignment for any existing tokenizer. Our results indicate that SID-level rankings in prior work should be interpreted with caution, and that reliable tokenizer evaluation requires either item-level correction or collision-free SID assignments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SID collisions inflate generative rec metrics by up to 103% on the tested sets, and the paper quantifies this while giving practical fixes, though the fixes need checks on semantic preservation.

read the letter

The main thing to know is that standard SID matching in generative recommendation does not equal item matching once collisions appear, and the paper measures how much this inflates common metrics.

They count collisions directly on held-out data from four datasets and five tokenizers, reaching 30.5% of items involved in shared SIDs. This leads to Hit@10 overestimation up to 103.36%, with the gap growing as collision rate rises. The work also supplies collision-aware metrics that stay at the item level and a low-cost reassignment step for the last SID level to remove collisions without retraining the tokenizer.

This measurement and the two fixes are the concrete new pieces. Prior papers treated SID sequences as unique item identifiers, so the direct count of the mismatch and the proposed corrections address a gap in how tokenizers get compared.

The counting approach is straightforward and does not rely on fitted parameters. The reassignment idea is pragmatic for existing tokenizers.

The softer parts are representativeness and the missing check on the reassignment. Four datasets and five tokenizers may not cover the range of real recommendation data or tokenizer designs, so the exact percentages could shift. The paper also does not show whether the reassigned SIDs keep the semantic properties that matter for generation quality or downstream performance.

People working on generative recommendation or SID tokenizers will want to read this, because it questions the evaluation numbers they currently use. It deserves a serious referee because the core observation is empirical and points to fixes that can be tested.

Referee Report

2 major / 2 minor

Summary. The paper claims that Semantic-ID (SID) collisions are common in generative recommendation (up to 30.5% of items involved across four datasets and five tokenizers), so that SID-level matching identifies collision groups rather than unique items; this causes SID-level metrics to overestimate item-level performance (Hit@10 inflated by up to 103.36%). It introduces collision-aware item-level metrics computed from generated SID sequences and a post-tokenizer last-level SID reassignment procedure that produces collision-free assignments at minimum cost.

Significance. If the empirical observations hold, the work identifies a previously under-appreciated source of metric inflation that affects the reliability of tokenizer comparisons in generative recommendation. Credit is due for the direct, parameter-free counting of collisions on held-out data and for supplying both diagnostic metrics and a practical correction procedure that can be applied to existing tokenizers.

major comments (2)

[§4] §4 (Experiments): the claim that results generalize to 'prior work' and 'other recommendation settings' rests on four datasets and five tokenizers being representative, yet no additional datasets, tokenizer variants, or sensitivity analysis are reported to support this; the observed 30.5% and 103.36% figures are therefore load-bearing for the headline cautionary conclusion.
[§3.3] §3.3 (Post-tokenizer reassignment): the procedure reassigns last-level SIDs to eliminate collisions, but no before/after check (e.g., item embedding cosine similarity, reconstruction quality, or downstream generation metrics) is provided to verify that semantic structure is preserved; this is required to ensure the corrected assignments remain faithful to the tokenizer's original intent.

minor comments (2)

[Table 1] Table 1: the tokenizer names and their hyper-parameter settings should be listed explicitly rather than referenced only by citation, to allow replication of the collision counts.
[§2] §2: the notation for 'collision group' versus 'unique SID' is introduced informally; a short formal definition or diagram would improve clarity when the collision-aware metrics are later defined.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the significance of our findings on SID collisions. Below we respond point-by-point to the major comments and indicate planned revisions.

read point-by-point responses

Referee: [§4] §4 (Experiments): the claim that results generalize to 'prior work' and 'other recommendation settings' rests on four datasets and five tokenizers being representative, yet no additional datasets, tokenizer variants, or sensitivity analysis are reported to support this; the observed 30.5% and 103.36% figures are therefore load-bearing for the headline cautionary conclusion.

Authors: The four datasets and five tokenizers were chosen because they match those used in prior generative recommendation studies; the collision rates and Hit@10 inflation are consistent in direction and scale across every dataset–tokenizer pair. We therefore view the reported maxima as illustrative of the problem’s potential severity rather than as universal constants. We agree that explicit sensitivity checks would strengthen the generalization statement. In revision we will expand the discussion of dataset and tokenizer representativeness and include any additional internal sensitivity results that can be computed from the existing experimental logs without new runs. revision: partial
Referee: [§3.3] §3.3 (Post-tokenizer reassignment): the procedure reassigns last-level SIDs to eliminate collisions, but no before/after check (e.g., item embedding cosine similarity, reconstruction quality, or downstream generation metrics) is provided to verify that semantic structure is preserved; this is required to ensure the corrected assignments remain faithful to the tokenizer's original intent.

Authors: The reassignment is deliberately restricted to the final SID level and is performed under a minimum-cost objective, which by construction changes the fewest assignments possible. We nevertheless accept that explicit verification is desirable. In the revised manuscript we will add before-and-after comparisons of item embedding cosine similarity for the reassigned items together with any available reconstruction-quality statistics to quantify how much semantic structure is retained. revision: yes

Circularity Check

0 steps flagged

No circularity; central results are direct empirical counts on held-out data.

full rationale

The paper reports collision fractions (up to 30.5%) and metric inflation (Hit@10 up to 103.36%) via explicit counting of shared SID sequences across four datasets and five tokenizers. These quantities are computed directly from the data and tokenizer outputs rather than fitted parameters, self-referential equations, or load-bearing self-citations. The collision-aware metrics and minimum-cost reassignment procedure are introduced as practical corrections without any derivation that reduces to the inputs by construction. The analysis chain is therefore self-contained empirical measurement.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper rests on the empirical observation that tokenizers produce collisions; it introduces no new free parameters, axioms, or invented entities beyond standard tokenizer behavior.

pith-pipeline@v0.9.1-grok · 5778 in / 1179 out tokens · 24473 ms · 2026-06-29T21:02:42.446763+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SIDInspector: A Mapping-First Diagnostic Resource for Semantic-ID Tokenizers
cs.IR 2026-06 accept novelty 6.0

SIDInspector provides a standardized adapter contract and mapping-level probes for Semantic-ID tokenizers, with empirical contrasts showing high aliasing in GRID-style exports and superior prefix alignment from determ...

Reference graph

Works this paper leans on

53 extracted references · 10 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. 2023. TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation. InProceedings of the 17th ACM Conference on Recommender Systems (RecSys). 1007–1014

2023
[2]

Maurizio Ferrari Dacrema, Paolo Cremonesi, and Dietmar Jannach. 2019. Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Rec- ommendation Approaches. InProceedings of the 13th ACM Conference on Recom- mender Systems (RecSys). 101–109

2019
[3]

Jiaxin Deng, Shiyao Wang, Kuo Cai, Lejian Ren, Qigen Hu, Weifeng Ding, Qiang Luo, and Guorui Zhou. 2025. OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment.arXiv preprint arXiv:2502.18965(2025), 1–10

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

Dengzhao Fang, Jingtong Gao, Chengcheng Zhu, Yu Li, Xiangyu Zhao, and Yi Chang. 2025. HiD-VAE: Interpretable Generative Recommendation via Hierar- chical and Disentangled Semantic IDs.arXiv preprint arXiv:2508.04618(2025), 1–13. How Reliable Are Semantic-ID Tokenizer Comparisons in Generative Recommendation?

work page arXiv 2025
[5]

Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang. 2022. Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5). InProceedings of the 16th ACM Conference on Recommender Systems. 299–315

2022
[6]

Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Net- work for Recommendation. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 639–648

2020
[7]

Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk
[8]

In Proceedings of the International Conference on Learning Representations

Session-based Recommendations with Recurrent Neural Networks. In Proceedings of the International Conference on Learning Representations. 1–10
[9]

Yupeng Hou, Zhankui He, Julian McAuley, and Wayne Xin Zhao. 2023. Learn- ing Vector-Quantized Item Representation for Transferable Sequential Recom- menders. InProceedings of the ACM Web Conference (WWW). 1162–1171

2023
[10]

Yupeng Hou, Jiacheng Li, Ashley Shin, Jinsung Jeon, Abhishek Santhanam, Wei Shao, Kaveh Hassani, Ning Yao, and Julian McAuley. 2025. Generating Long Semantic IDs in Parallel for Recommendation. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1–12

2025
[11]

Peiyu Hu, Wayne Lu, and Jia Wang. 2026. From IDs to Semantics: A Genera- tive Framework for Cross-Domain Recommendation with Adaptive Semantic Tokenization. InProceedings of the AAAI Conference on Artificial Intelligence

2026
[12]

Zheng Hu, Yuxin Chen, Yongsen Pan, Xu Yuan, Yuting Yin, Daoyuan Wang, Boyang Xia, Zefei Luo, Hongyang Wang, Songhao Ni, Dongxu Liang, Jun Wang, Shimin Cai, Tao Zhou, Fuji Ren, and Wenwu Ou. 2026. Stop Treating Collisions Equally: Qualification-Aware Semantic ID Learning for Recommendation at Industrial Scale.arXiv:2603.00632(2026), 1–10

work page arXiv 2026
[13]

Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search.IEEE Transactions on Pattern Analysis and Machine Intelligence33, 1 (2011), 117–128

2011
[14]

Bowen Jin, Hansi Zeng, Guoyin Wang, Xiusi Chen, Tianxin Wei, Ruirui Li, Zhengyang Wang, Zheng Li, Yang Li, Hanqing Lu, Suhang Wang, Jiawei Han, and Xianfeng Tang. 2024. Language Models as Semantic Indexers. InProceedings of the 41st International Conference on Machine Learning (ICML). 22244–22259

2024
[15]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2021. Billion-Scale Similarity Search with GPUs.IEEE Transactions on Big Data7, 3 (2021), 535–547

2021
[16]

Clark Mingxuan Ju, Liam Collins, Leonardo Neves, Bhuvesh Kumar, Louis Yufeng Wang, Tong Zhao, and Neil Shah. 2025. Generative Recommendation with Seman- tic IDs: A Practitioner’s Handbook. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 6420–6425

2025
[17]

Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Recom- mendation. InProceedings of the IEEE International Conference on Data Mining. 197–206

2018
[18]

Walid Krichene and Steffen Rendle. 2020. On Sampled Metrics for Item Recom- mendation. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 1748–1757

2020
[19]

Harold W. Kuhn. 1955. The Hungarian Method for the Assignment Problem. Naval Research Logistics Quarterly2, 1–2 (1955), 83–97

1955
[20]

Doyup Lee, Chiheon Kim, Saehoon Kim, Minsu Cho, and Wook-Shin Han. 2022. Autoregressive Image Generation Using Residual Quantization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11513–11522

2022
[21]

Omer Levy and Yoav Goldberg. 2014. Neural Word Embedding as Implicit Matrix Factorization. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 27

2014
[22]

Yu Liang, Zhongjin Zhang, Yuxuan Zhu, Kerui Zhang, Zhiluohan Guo, Wenhang Zhou, Zonqi Yang, Kangle Wu, Yabo Ni, Anxiang Zeng, Cong Fu, Jianxin Wang, and Jiazhi Xia. 2026. Rethinking Generative Recommender Tokenizer: Recsys- Native Encoding and Semantic Quantization Beyond LLMs.arXiv preprint arXiv:2602.02338(2026), 1–22

work page arXiv 2026
[23]

Fake Lin, Binbin Hu, Zhi Zheng, Xi Zhu, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, and Tong Xu. 2026. Token-level Collaborative Alignment for LLM-based Generative Recommendation. InProceedings of the ACM Web Conference (WWW)

2026
[24]

Jianghao Lin, Xinyi Dai, Yunjia Xi, Weiwen Liu, Bo Chen, Hao Zhang, Yong Liu, Chuhan Wu, Xiangyang Li, Chenxu Zhu, Huifeng Guo, Yong Yu, Ruiming Tang, and Weinan Zhang. 2025. How Can Recommender Systems Benefit from Large Language Models: A Survey.ACM Transactions on Information Systems43, 2 (2025), 1–47

2025
[25]

Enze Liu, Bowen Zheng, Cheng Ling, Lantao Hu, Han Li, and Wayne Xin Zhao
[26]

InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval

Generative Recommender with End-to-End Learnable Item Tokenization. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 729–739
[27]

Jingzhe Liu, Liam Collins, Jiliang Tang, Tong Zhao, Neil Shah, and Clark Mingx- uan Ju. 2025. Understanding Generative Recommendation with Semantic IDs from a Model-Scaling View.arXiv preprint arXiv:2509.25522(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

Xinchen Luo, Jiangxia Cao, Tianyu Sun, Jinkai Yu, Rui Huang, Wei Yuan, Hezheng Lin, Yichen Zheng, Shiyao Wang, Qigen Hu, et al. 2025. QARM: Quantitative alignment multi-modal recommendation at Kuaishou. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 5915– 5922

2025
[29]

Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. InProceedings of the Conference on Empirical Methods in Natural Language Processing. 188–197

2019
[30]

Haohao Qu, Wenqi Fan, Zihuai Zhao, and Qing Li. 2025. TokenRec: Learning to Tokenize ID for LLM-Based Generative Recommendations.IEEE Transactions on Knowledge and Data Engineering37, 10 (2025), 6216–6231

2025
[31]

Haohao Qu, Shanru Lin, Yujuan Ding, Yiqi Wang, and Wenqi Fan. 2026. Diffusion Generative Recommendation with Continuous Tokens. InProceedings of the ACM Web Conference (WWW)

2026
[32]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.Journal of Machine Learning Research21, 140 (2020), 1–67

2020
[33]

Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al
[34]

InAdvances in Neural Information Processing Systems (NeurIPS)

Recommender Systems with Generative Retrieval. InAdvances in Neural Information Processing Systems (NeurIPS). 10299–10315
[35]

Steffen Rendle, Li Zhang, and Yehuda Koren. 2019. On the Difficulty of Evaluating Baselines: A Study on Recommender Systems.arXiv preprint arXiv:1905.01395 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019
[36]

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang
[37]

InProceedings of the 28th ACM International Conference on Information and Knowledge Management

BERT4Rec: Sequential Recommendation with Bidirectional Encoder Rep- resentations from Transformer. InProceedings of the 28th ACM International Conference on Information and Knowledge Management. 1441–1450
[38]

Aäron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. 2017. Neural Discrete Representation Learning. InAdvances in Neural Information Processing Systems (NeurIPS). 6306–6315

2017
[39]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE.Journal of Machine Learning Research9, 86 (2008), 2579–2605

2008
[40]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. InAdvances in Neural Information Processing Systems (NeurIPS). 5998– 6008

2017
[41]

Chao Wang, Yixin Song, Jinhui Ye, Chuan Qin, Dazhong Shen, Lingfeng Liu, Xi- ang Wang, and Yanyong Zhang. 2025. FACE: A General Framework for Mapping Collaborative Filtering Embeddings into LLM Tokens. InAdvances in Neural Information Processing Systems (NeurIPS)

2025
[42]

Wenjie Wang, Honghui Bao, Xinyu Lin, Jizhi Zhang, Yongqi Li, Fuli Feng, See- Kiong Ng, and Tat-Seng Chua. 2024. Learnable item tokenization for generative recommendation. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 2400–2409

2024
[43]

Ye Wang, Jiahao Xun, Minjie Hong, Jieming Zhu, Tao Jin, Wang Lin, Haoyuan Li, Linjun Li, Yan Xia, Zhou Zhao, and Zhenhua Dong. 2024. EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3245–3254

2024
[44]

Xu Xie, Fei Sun, Zhaoyang Liu, Shiwen Wu, Jinyang Gao, Jiandong Zhang, Bolin Ding, and Bin Cui. 2022. Contrastive Learning for Sequential Recommendation. InProceedings of the 38th IEEE International Conference on Data Engineering (ICDE). 1259–1273

2022
[45]

Yuhao Yang, Zhi Ji, Zhaopeng Li, Yi Li, Zhonglin Mo, Yue Ding, Kai Chen, Zijian Zhang, Jie Li, Shuanglong Li, and Lin Liu. 2025. Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations. In Advances in Neural Information Processing Systems (NeurIPS)

2025
[46]

Jianyang Zhai, Zi-Feng Mai, Chang-Dong Wang, Feidiao Yang, Xiawu Zheng, Hui Li, and Yonghong Tian. 2025. Multimodal Quantitative Language for Generative Recommendation. InThe 13th International Conference on Learning Representa- tions. 1–23

2025
[47]

Ruohan Zhang, Jiacheng Li, Julian McAuley, and Yupeng Hou. 2025. Purely Semantic Indexing for LLM-based Generative Recommendation and Retrieval. arXiv preprint arXiv:2509.16446(2025), 1–9

work page arXiv 2025
[48]

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou
[49]

Qwen3 Embedding: Advancing Text Embedding and Reranking through Foundation Models.arXiv preprint arXiv:2506.05176(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[50]

Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, and Ji-Rong Wen. 2024. Adapting large language models by integrating collaborative semantics for recommendation. In2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE, 1435–1448

2024
[51]

Qiyong Zhong, Jiajie Su, Yunshan Ma, Julian McAuley, and Yupeng Hou. 2025. Pctx: Tokenizing Personalized Context for Generative Recommendation.arXiv preprint arXiv:2510.21276(2025)

work page arXiv 2025
[52]

Guorui Zhou, Honghui Bao, Jiaming Huang, Jiaxin Deng, Jinghao Zhang, Junda She, Kuo Cai, Lejian Ren, Lu Ren, Qiang Luo, et al. 2025. OpenOneRec Technical Report.arXiv preprint arXiv:2512.24762(2025), 1–36. Qian et al

work page arXiv 2025
[53]

Kun Zhou, Hui Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, and Ji-Rong Wen. 2020. S 3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization. InPro- ceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM). 1893–1902

2020

[1] [1]

Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. 2023. TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation. InProceedings of the 17th ACM Conference on Recommender Systems (RecSys). 1007–1014

2023

[2] [2]

Maurizio Ferrari Dacrema, Paolo Cremonesi, and Dietmar Jannach. 2019. Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Rec- ommendation Approaches. InProceedings of the 13th ACM Conference on Recom- mender Systems (RecSys). 101–109

2019

[3] [3]

Jiaxin Deng, Shiyao Wang, Kuo Cai, Lejian Ren, Qigen Hu, Weifeng Ding, Qiang Luo, and Guorui Zhou. 2025. OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment.arXiv preprint arXiv:2502.18965(2025), 1–10

work page internal anchor Pith review Pith/arXiv arXiv 2025

[4] [4]

Dengzhao Fang, Jingtong Gao, Chengcheng Zhu, Yu Li, Xiangyu Zhao, and Yi Chang. 2025. HiD-VAE: Interpretable Generative Recommendation via Hierar- chical and Disentangled Semantic IDs.arXiv preprint arXiv:2508.04618(2025), 1–13. How Reliable Are Semantic-ID Tokenizer Comparisons in Generative Recommendation?

work page arXiv 2025

[5] [5]

Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang. 2022. Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5). InProceedings of the 16th ACM Conference on Recommender Systems. 299–315

2022

[6] [6]

Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Net- work for Recommendation. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 639–648

2020

[7] [7]

Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk

[8] [8]

In Proceedings of the International Conference on Learning Representations

Session-based Recommendations with Recurrent Neural Networks. In Proceedings of the International Conference on Learning Representations. 1–10

[9] [9]

Yupeng Hou, Zhankui He, Julian McAuley, and Wayne Xin Zhao. 2023. Learn- ing Vector-Quantized Item Representation for Transferable Sequential Recom- menders. InProceedings of the ACM Web Conference (WWW). 1162–1171

2023

[10] [10]

Yupeng Hou, Jiacheng Li, Ashley Shin, Jinsung Jeon, Abhishek Santhanam, Wei Shao, Kaveh Hassani, Ning Yao, and Julian McAuley. 2025. Generating Long Semantic IDs in Parallel for Recommendation. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1–12

2025

[11] [11]

Peiyu Hu, Wayne Lu, and Jia Wang. 2026. From IDs to Semantics: A Genera- tive Framework for Cross-Domain Recommendation with Adaptive Semantic Tokenization. InProceedings of the AAAI Conference on Artificial Intelligence

2026

[12] [12]

Zheng Hu, Yuxin Chen, Yongsen Pan, Xu Yuan, Yuting Yin, Daoyuan Wang, Boyang Xia, Zefei Luo, Hongyang Wang, Songhao Ni, Dongxu Liang, Jun Wang, Shimin Cai, Tao Zhou, Fuji Ren, and Wenwu Ou. 2026. Stop Treating Collisions Equally: Qualification-Aware Semantic ID Learning for Recommendation at Industrial Scale.arXiv:2603.00632(2026), 1–10

work page arXiv 2026

[13] [13]

Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search.IEEE Transactions on Pattern Analysis and Machine Intelligence33, 1 (2011), 117–128

2011

[14] [14]

Bowen Jin, Hansi Zeng, Guoyin Wang, Xiusi Chen, Tianxin Wei, Ruirui Li, Zhengyang Wang, Zheng Li, Yang Li, Hanqing Lu, Suhang Wang, Jiawei Han, and Xianfeng Tang. 2024. Language Models as Semantic Indexers. InProceedings of the 41st International Conference on Machine Learning (ICML). 22244–22259

2024

[15] [15]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2021. Billion-Scale Similarity Search with GPUs.IEEE Transactions on Big Data7, 3 (2021), 535–547

2021

[16] [16]

Clark Mingxuan Ju, Liam Collins, Leonardo Neves, Bhuvesh Kumar, Louis Yufeng Wang, Tong Zhao, and Neil Shah. 2025. Generative Recommendation with Seman- tic IDs: A Practitioner’s Handbook. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 6420–6425

2025

[17] [17]

Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Recom- mendation. InProceedings of the IEEE International Conference on Data Mining. 197–206

2018

[18] [18]

Walid Krichene and Steffen Rendle. 2020. On Sampled Metrics for Item Recom- mendation. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 1748–1757

2020

[19] [19]

Harold W. Kuhn. 1955. The Hungarian Method for the Assignment Problem. Naval Research Logistics Quarterly2, 1–2 (1955), 83–97

1955

[20] [20]

Doyup Lee, Chiheon Kim, Saehoon Kim, Minsu Cho, and Wook-Shin Han. 2022. Autoregressive Image Generation Using Residual Quantization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11513–11522

2022

[21] [21]

Omer Levy and Yoav Goldberg. 2014. Neural Word Embedding as Implicit Matrix Factorization. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 27

2014

[22] [22]

Yu Liang, Zhongjin Zhang, Yuxuan Zhu, Kerui Zhang, Zhiluohan Guo, Wenhang Zhou, Zonqi Yang, Kangle Wu, Yabo Ni, Anxiang Zeng, Cong Fu, Jianxin Wang, and Jiazhi Xia. 2026. Rethinking Generative Recommender Tokenizer: Recsys- Native Encoding and Semantic Quantization Beyond LLMs.arXiv preprint arXiv:2602.02338(2026), 1–22

work page arXiv 2026

[23] [23]

Fake Lin, Binbin Hu, Zhi Zheng, Xi Zhu, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, and Tong Xu. 2026. Token-level Collaborative Alignment for LLM-based Generative Recommendation. InProceedings of the ACM Web Conference (WWW)

2026

[24] [24]

Jianghao Lin, Xinyi Dai, Yunjia Xi, Weiwen Liu, Bo Chen, Hao Zhang, Yong Liu, Chuhan Wu, Xiangyang Li, Chenxu Zhu, Huifeng Guo, Yong Yu, Ruiming Tang, and Weinan Zhang. 2025. How Can Recommender Systems Benefit from Large Language Models: A Survey.ACM Transactions on Information Systems43, 2 (2025), 1–47

2025

[25] [25]

Enze Liu, Bowen Zheng, Cheng Ling, Lantao Hu, Han Li, and Wayne Xin Zhao

[26] [26]

InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval

Generative Recommender with End-to-End Learnable Item Tokenization. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 729–739

[27] [27]

Jingzhe Liu, Liam Collins, Jiliang Tang, Tong Zhao, Neil Shah, and Clark Mingx- uan Ju. 2025. Understanding Generative Recommendation with Semantic IDs from a Model-Scaling View.arXiv preprint arXiv:2509.25522(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[28] [28]

Xinchen Luo, Jiangxia Cao, Tianyu Sun, Jinkai Yu, Rui Huang, Wei Yuan, Hezheng Lin, Yichen Zheng, Shiyao Wang, Qigen Hu, et al. 2025. QARM: Quantitative alignment multi-modal recommendation at Kuaishou. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 5915– 5922

2025

[29] [29]

Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. InProceedings of the Conference on Empirical Methods in Natural Language Processing. 188–197

2019

[30] [30]

Haohao Qu, Wenqi Fan, Zihuai Zhao, and Qing Li. 2025. TokenRec: Learning to Tokenize ID for LLM-Based Generative Recommendations.IEEE Transactions on Knowledge and Data Engineering37, 10 (2025), 6216–6231

2025

[31] [31]

Haohao Qu, Shanru Lin, Yujuan Ding, Yiqi Wang, and Wenqi Fan. 2026. Diffusion Generative Recommendation with Continuous Tokens. InProceedings of the ACM Web Conference (WWW)

2026

[32] [32]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.Journal of Machine Learning Research21, 140 (2020), 1–67

2020

[33] [33]

Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al

[34] [34]

InAdvances in Neural Information Processing Systems (NeurIPS)

Recommender Systems with Generative Retrieval. InAdvances in Neural Information Processing Systems (NeurIPS). 10299–10315

[35] [35]

Steffen Rendle, Li Zhang, and Yehuda Koren. 2019. On the Difficulty of Evaluating Baselines: A Study on Recommender Systems.arXiv preprint arXiv:1905.01395 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019

[36] [36]

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang

[37] [37]

InProceedings of the 28th ACM International Conference on Information and Knowledge Management

BERT4Rec: Sequential Recommendation with Bidirectional Encoder Rep- resentations from Transformer. InProceedings of the 28th ACM International Conference on Information and Knowledge Management. 1441–1450

[38] [38]

Aäron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. 2017. Neural Discrete Representation Learning. InAdvances in Neural Information Processing Systems (NeurIPS). 6306–6315

2017

[39] [39]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE.Journal of Machine Learning Research9, 86 (2008), 2579–2605

2008

[40] [40]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. InAdvances in Neural Information Processing Systems (NeurIPS). 5998– 6008

2017

[41] [41]

Chao Wang, Yixin Song, Jinhui Ye, Chuan Qin, Dazhong Shen, Lingfeng Liu, Xi- ang Wang, and Yanyong Zhang. 2025. FACE: A General Framework for Mapping Collaborative Filtering Embeddings into LLM Tokens. InAdvances in Neural Information Processing Systems (NeurIPS)

2025

[42] [42]

Wenjie Wang, Honghui Bao, Xinyu Lin, Jizhi Zhang, Yongqi Li, Fuli Feng, See- Kiong Ng, and Tat-Seng Chua. 2024. Learnable item tokenization for generative recommendation. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 2400–2409

2024

[43] [43]

Ye Wang, Jiahao Xun, Minjie Hong, Jieming Zhu, Tao Jin, Wang Lin, Haoyuan Li, Linjun Li, Yan Xia, Zhou Zhao, and Zhenhua Dong. 2024. EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3245–3254

2024

[44] [44]

Xu Xie, Fei Sun, Zhaoyang Liu, Shiwen Wu, Jinyang Gao, Jiandong Zhang, Bolin Ding, and Bin Cui. 2022. Contrastive Learning for Sequential Recommendation. InProceedings of the 38th IEEE International Conference on Data Engineering (ICDE). 1259–1273

2022

[45] [45]

Yuhao Yang, Zhi Ji, Zhaopeng Li, Yi Li, Zhonglin Mo, Yue Ding, Kai Chen, Zijian Zhang, Jie Li, Shuanglong Li, and Lin Liu. 2025. Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations. In Advances in Neural Information Processing Systems (NeurIPS)

2025

[46] [46]

Jianyang Zhai, Zi-Feng Mai, Chang-Dong Wang, Feidiao Yang, Xiawu Zheng, Hui Li, and Yonghong Tian. 2025. Multimodal Quantitative Language for Generative Recommendation. InThe 13th International Conference on Learning Representa- tions. 1–23

2025

[47] [47]

Ruohan Zhang, Jiacheng Li, Julian McAuley, and Yupeng Hou. 2025. Purely Semantic Indexing for LLM-based Generative Recommendation and Retrieval. arXiv preprint arXiv:2509.16446(2025), 1–9

work page arXiv 2025

[48] [48]

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou

[49] [49]

Qwen3 Embedding: Advancing Text Embedding and Reranking through Foundation Models.arXiv preprint arXiv:2506.05176(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[50] [50]

Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, and Ji-Rong Wen. 2024. Adapting large language models by integrating collaborative semantics for recommendation. In2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE, 1435–1448

2024

[51] [51]

Qiyong Zhong, Jiajie Su, Yunshan Ma, Julian McAuley, and Yupeng Hou. 2025. Pctx: Tokenizing Personalized Context for Generative Recommendation.arXiv preprint arXiv:2510.21276(2025)

work page arXiv 2025

[52] [52]

Guorui Zhou, Honghui Bao, Jiaming Huang, Jiaxin Deng, Jinghao Zhang, Junda She, Kuo Cai, Lejian Ren, Lu Ren, Qiang Luo, et al. 2025. OpenOneRec Technical Report.arXiv preprint arXiv:2512.24762(2025), 1–36. Qian et al

work page arXiv 2025

[53] [53]

Kun Zhou, Hui Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, and Ji-Rong Wen. 2020. S 3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization. InPro- ceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM). 1893–1902

2020