pith. sign in

arxiv: 2605.14853 · v2 · pith:446ZDU5Fnew · submitted 2026-05-14 · 💻 cs.IR

Discrimination Is Generation: Unifying Ranking and Retrieval from a Tokenizer Perspective

Pith reviewed 2026-05-25 05:45 UTC · model grok-4.3

classification 💻 cs.IR
keywords semantic IDsgenerative recommendationrankingretrievaltokenizerend-to-end training
0
0 comments X

The pith

Embedding the tokenizer inside a ranking model unifies ranking and retrieval because they solve the same argmax problem at item and token scales.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that existing semantic ID tokenizers lag because they are trained separately from ranking. It claims that ranking and retrieval are the same task at different granularities, so joint training of the tokenizer within the ranker incorporates personalization into the generation space. This produces a model that serves as both ranker and retriever from one training run. Experiments confirm gains in ranking, retrieval, and their unified setting on public and industrial data.

Core claim

Ranking seeks argmax in item space while retrieval seeks argmax in token space; both are the same problem solved at different granularities. Based on this insight, DIG embeds the tokenizer inside a discriminative ranking model for end-to-end training -- the ranker naturally becomes a retrieval model, yielding two models from a single training run.

What carries the argument

The feature assignment taxonomy that encodes item-intrinsic static features into SIDs, allows user-item cross features to drive codebook boundaries toward recommendation decision boundaries, and uses an MLP_u2t distillation module to approximate u2i at the token level for inference.

If this is right

  • Simultaneous improvements to ranking, retrieval, and unified retrieval-ranking quality
  • Two models obtained from a single training run
  • Effective across three public benchmarks and two industrial datasets

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The unification suggests recommendation pipelines could use a single model for both retrieval and ranking stages.
  • This training strategy might allow better alignment of generated semantic IDs with user preferences without separate optimization.
  • Extensions could apply the same principle to other generative tasks that currently separate discrimination and generation objectives.

Load-bearing premise

The premise that ranking and retrieval are fundamentally the same problem solved at different granularities, so embedding the tokenizer inside the ranker transfers personalization signals to the semantic IDs.

What would settle it

Demonstrating that joint training does not improve or worsens retrieval performance relative to a tokenizer trained with a separate retrieval objective.

Figures

Figures reproduced from arXiv: 2605.14853 by Changhao Li, Chi Wang, Haitao Wang, Junwei Yin, Senjie Kou, Shuli Wang, Xingxing Wang, Yinhua Zhu, Yinqiu Huang.

Figure 1
Figure 1. Figure 1: Comparison of existing generative retrieval [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall architecture of DIG. The tokenizer is embedded inside the DIN+DCNv2+MoE ranker. Three feature streams are handled by the feature assignment taxonomy: Type-I static features shape SIDs offline; Type-II request-level features serve as ranking/decoder conditions; Type-III u2i cross features implicitly shape codebook boundaries during training and are approximated by MLPu2t at inference. SID–embedding … view at source ↗
read the original abstract

Semantic IDs (SIDs) define the generation space of generative recommendation and directly determine its personalization ceiling. However, existing tokenizers are trained independently with retrieval objectives, leaving personalization signals fully decoupled from the SID construction process -- a fundamental gap that causes generative retrieval to persistently lag behind discriminative ranking. In this paper, we rethink the essence of SIDs: \emph{ranking seeks argmax in item space while retrieval seeks argmax in token space; both are the same problem solved at different granularities.} Based on this insight, we propose \DIG (\textbf{D}iscrimination \textbf{I}s \textbf{G}eneration), which embeds the tokenizer inside a discriminative ranking model for end-to-end training -- the ranker naturally becomes a retrieval model, yielding two models from a single training run. \DIG is organized around a \emph{feature assignment taxonomy}: item-intrinsic static features are encoded into SIDs, user-item cross features (u2i) implicitly drive codebook boundaries toward recommendation decision boundaries during training, and an MLP$_\mathrm{u2t}$ distillation module approximates u2i at the token level for inference. Experiments on three public benchmarks and two industrial datasets demonstrate that \DIG simultaneously improves ranking, retrieval, and unified retrieval-ranking quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that existing semantic ID (SID) tokenizers for generative recommendation are trained independently of personalization signals, creating a gap with discriminative ranking. It reframes the problem by asserting that ranking seeks argmax in item space while retrieval seeks argmax in token space, so both are the same task at different granularities. DIG embeds the tokenizer inside a discriminative ranker for end-to-end training, using a feature assignment taxonomy (item-intrinsic features to SIDs, user-item cross features to drive codebook boundaries, and an MLP_u2t distillation module to approximate cross features at inference). This is claimed to produce both a ranker and a retrieval model from one training run, with simultaneous gains in ranking, retrieval, and unified metrics on three public benchmarks and two industrial datasets.

Significance. If the unification holds and the reported gains are robust, the work could meaningfully narrow the performance gap between generative retrieval and discriminative ranking in recommendation systems by integrating personalization directly into SID construction, potentially enabling more efficient dual-purpose models.

major comments (2)
  1. [Method (feature assignment taxonomy and DIG architecture)] The core unification premise (ranking as argmax in item space equals retrieval as argmax in token space) is presented as an insight but lacks a formal derivation, loss-function definition, or proof of equivalence; without this, it is unclear whether the end-to-end training actually transfers the argmax property or merely trains a joint model. This is load-bearing for the central claim.
  2. [Experiments section] The abstract and experimental claims report simultaneous improvements across five datasets, yet no equations for the training objective, data splits, statistical tests, or ablation isolating the tokenizer-embedding effect are visible; this prevents verification that gains stem from the proposed unification rather than implementation details.
minor comments (1)
  1. [Method] Notation for MLP_u2t and the three-way feature taxonomy would benefit from an explicit diagram or table defining which features map to which component.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the unification premise and experimental presentation. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core contributions.

read point-by-point responses
  1. Referee: [Method (feature assignment taxonomy and DIG architecture)] The core unification premise (ranking as argmax in item space equals retrieval as argmax in token space) is presented as an insight but lacks a formal derivation, loss-function definition, or proof of equivalence; without this, it is unclear whether the end-to-end training actually transfers the argmax property or merely trains a joint model. This is load-bearing for the central claim.

    Authors: We agree that the unification is introduced as a conceptual insight rather than a formal theorem, and that additional formalization would clarify how the end-to-end training transfers the argmax property. The premise follows from both tasks optimizing the same underlying scoring function (the ranker), with retrieval simply operating over the discrete token space induced by the tokenizer. The feature assignment taxonomy and joint optimization ensure that codebook boundaries align with ranking decision boundaries. We will add a dedicated subsection in the revised manuscript that (i) defines the joint training objective with explicit loss equations, (ii) shows the argmax equivalence at the level of the optimization problem, and (iii) explains why the tokenizer embedding step preserves the property rather than merely producing a joint model. revision: yes

  2. Referee: [Experiments section] The abstract and experimental claims report simultaneous improvements across five datasets, yet no equations for the training objective, data splits, statistical tests, or ablation isolating the tokenizer-embedding effect are visible; this prevents verification that gains stem from the proposed unification rather than implementation details.

    Authors: We acknowledge that the current presentation does not make these elements sufficiently prominent for independent verification. The full manuscript contains the training objective (Section 3.2), data splits (Section 4.1), and ablations (Section 4.3), but we will revise the experiments section to (i) surface the key equations in the main text, (ii) report statistical significance tests (paired t-tests with p-values) for all claimed improvements, and (iii) add an explicit ablation that isolates the contribution of embedding the tokenizer inside the ranker versus training it separately. These additions will allow readers to confirm that the gains arise from the proposed unification. revision: yes

Circularity Check

0 steps flagged

No significant circularity: conceptual reframing with independent empirical support

full rationale

The paper presents its core unification as a conceptual insight ('ranking seeks argmax in item space while retrieval seeks argmax in token space; both are the same problem solved at different granularities') that motivates an architectural choice (embedding tokenizer inside a discriminative ranker for end-to-end training). No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The claim is not derived from prior fitted results or definitions that presuppose the outcome; instead, it is tested empirically across five datasets. This is a standard non-circular reframing followed by experimental validation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is abstract-only; no concrete free parameters, axioms, or invented entities can be extracted beyond the high-level domain assumption stated in the abstract.

axioms (1)
  • domain assumption Ranking seeks argmax in item space while retrieval seeks argmax in token space; both are the same problem solved at different granularities.
    This equivalence is presented as the foundational insight enabling the DIG architecture.

pith-pipeline@v0.9.0 · 5778 in / 993 out tokens · 21705 ms · 2026-05-25T05:45:28.444028+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 12 canonical work pages · 4 internal anchors

  1. [1]

    Jiaxin Deng, Shiyao Wang, Kuo Cai, Lejian Ren, Qigen Hu, Weifeng Ding, Qiang Luo, and Guorui Zhou. 2025. OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment.arXiv preprint arXiv:2502.18965(2025)

  2. [2]

    Jiangxia Du, Xiaojun Chen, Bin Liao, and Shen Zhu. 2024. MTGRec: Multi- Tokenization Graph Recommendation. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

  3. [3]

    Jiangxia Du, Jingping Song, Bin Liao, Shen Zhu, and Runze Xie. 2025. ETEGRec: End-to-End Tokenizer-Encoder-Generator for Generative Recommendation. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval

  4. [4]

    Junchen Fu, Xuri Ge, Alexandros Karatzoglou, Ioannis Arapakis, Suzan Ver- berne, Joemon M Jose, and Zhaochun Ren. 2026. Differentiable Semantic ID for Generative Recommendation.arXiv preprint arXiv:2601.19711(2026)

  5. [5]

    Chongming Gao, Shijun Li, Yuan Zhang, Jiawei Chen, Biao Li, Wenqiang Lei, Peng Jiang, and Xiangnan He. 2022. KuaiRec: A Fully-Observed Dataset and Insights for Evaluating Recommender Systems. InProceedings of the 31st ACM International Conference on Information & Knowledge Management. 540–550

  6. [6]

    Hao Li et al . 2024. SynerGen-VL: Towards Synergistic Image Understand- ing and Generation with Vision Experts and Token Folding.arXiv preprint arXiv:2412.09604(2024)

  7. [7]

    Hanbing Li, Pengyu Xiao, Mingliang Xu, Yongquan Liu, Shen Fan, Ruiming Wang, and Zhen Li. 2024. LETTER: Linking Collaborative and Language Embeddings for Tokenization in Recommender Systems. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1401–1410

  8. [8]

    Zhe Li, Mingyang Zhao, Keyu Wang, Xinlei Hu, Zhiying Wang, and Chenfei Gu

  9. [9]

    Investigating tidal stripping of a pre-existing moon as the origin of Saturn's young icy rings

    TRM: Semantic Token Replacement for Online Search and Recommendation. arXiv preprint arXiv:2603.14088(2026)

  10. [10]

    Yu Liang, Zhongjin Zhang, Yuxuan Zhu, Kerui Zhang, Zhiluohan Guo, Wen- hang Zhou, Zonqi Yang, Kangle Wu, Yabo Ni, Anxiang Zeng, Cong Fu, Jianxin Wang, and Jiazhi Xia. 2026. Rethinking Generative Recommender Tokenizer: Recsys-Native Encoding and Semantic Quantization Beyond LLMs.arXiv preprint arXiv:2602.02338(2026)

  11. [11]

    Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan H Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q Tran, Jonah Samost, et al. 2023. Recommender Systems with Generative Retrieval. InAdvances in Neural Infor- mation Processing Systems, Vol. 36

  12. [12]

    Dekai Sun, Yiming Liu, Jiafan Zhou, Xun Liu, Chenchen Yu, Yi Li, Jun Zhang, Huan Yu, and Jie Jiang. 2026. OneRanker: Unified Generation and Ranking with One Model in Industrial Advertising Recommendation.arXiv preprint arXiv:2603.02999(2026)

  13. [13]

    Ruoxi Wang, Rakesh Shivanna, Derek Cheng, Sagar Jain, Dong Lin, Lichan Hong, and Ed Chi. 2021. DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems. InProceedings of the Web Conference 2021. 1785–1797

  14. [14]

    Ziliang Wang, Gaoyun Lin, Xuesi Wang, Shaoqiang Liang, Yili Huang, Weijie Bian, et al . 2026. UniRec: Bridging the Expressive Gap between Generative and Discriminative Recommendation via Chain-of-Attribute.arXiv preprint arXiv:2604.12234(2026)

  15. [15]

    Tianxin Wei, Xuying Ning, Xuxing Chen, Ruizhong Qiu, Yupeng Hou, Yan Xie, Shuang Yang, Zhigang Hua, and Jingrui He. 2025. CoFiRec: Coarse-to-Fine Tokenization for Generative Recommendation.arXiv preprint arXiv:2511.22707 (2025)

  16. [16]

    Yi Xu, Chaofan Fan, Jinxin Hu, Yu Zhang, Zeng Xiaoyi, and Jing Zhang. 2025. STORE: Semantic Tokenization, Orthogonal Rotation and Efficient Attention for Scaling Up Ranking Models.arXiv preprint arXiv:2511.18805(2025)

  17. [17]

    Haoran Yang, Pan Li, Chen Gao, Wenjun Fu, Depeng Jin, and Yong Li. 2024. CoST: Contrastive Quantization based Semantic Tokenization for Generative Recommendation. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

  18. [18]

    Junwei Yin, Senjie Kou, Changhao Li, Shuli Wang, Xue Wei, Yinqiu Huang, Yinhua Zhu, Haitao Wang, and Xingxing Wang. 2026. DOS: Dual-Flow Orthogonal Semantic IDs for Recommendation. InCompanion Proceedings of the ACM Web Conference 2026

  19. [19]

    Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, and Marco Tagliasacchi. 2021. SoundStream: An End-to-End Neural Audio Codec.IEEE/ACM Transactions on Audio, Speech, and Language Processing30 (2021), 495–507

  20. [20]

    Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhao- jie Gong, Fangda Gu, Michael He, et al. 2024. Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations. In Proceedings of the 41st International Conference on Machine Learning

  21. [21]

    Hao Zhang, Runze Xie, Bin Liao, Sen Xu, Zhenhua Liu, and Yong Ge. 2024. UIST: Unified Item Semantic Tokenization for Better Retrieval and Ranking.arXiv preprint arXiv:2405.09170(2024)

  22. [22]

    Kun Zhang, Jingming Zhang, Wei Cheng, Yansong Cheng, Jiaqi Zhang, Hao Lu, Xu Zhang, Haixiang Gan, Jiangxia Cao, Tenglong Wang, et al. 2026. OneMall: One Architecture, More Scenarios – End-to-End Generative Recommender Family at Kuaishou E-Commerce.arXiv preprint arXiv:2601.21770(2026)

  23. [23]

    Luankang Zhang, Kenan Song, Yi Quan Lee, Wei Guo, Hao Wang, et al . 2025. Killing Two Birds with One Stone: Unifying Retrieval and Ranking with a Single Generative Recommendation Model.arXiv preprint arXiv:2504.16454(2025)

  24. [24]

    Zihua Zheng, Xiaoquan Liu, Yanan Gu, Biao Liu, and Xiao Wang. 2024. DAS: Discriminability-oriented Semantic Token Assignment for Generative Recom- mendation. InProceedings of the ACM Web Conference 2024

  25. [25]

    Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep Interest Network for Click- Through Rate Prediction. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1059–1068

  26. [26]

    Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, and Kun Gai

  27. [27]

    InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

    Learning Tree-based Deep Model for Recommender Systems. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1079–1088