Recognition: unknown
GenRec: A Preference-Oriented Generative Framework for Large-Scale Recommendation
Pith reviewed 2026-05-10 09:55 UTC · model grok-4.3
The pith
A generative recommendation model with page-wise next-token prediction and hybrid-reward reinforcement learning delivers 9.5% more clicks and 8.7% more transactions in live A/B tests.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GenRec resolves three scaling challenges in generative retrieval for recommendation through a unified decoder-only model: Page-wise NTP supervises over full pages to provide denser gradients and resolve one-to-many ambiguities; an asymmetric linear Token Merger compresses semantic ID prompts while preserving decoding resolution; and GRPO-SR pairs group relative policy optimization with NLL regularization and hybrid rewards to align outputs with nuanced user satisfaction without reward hacking.
What carries the argument
The combination of Page-wise NTP training objective, asymmetric Token Merger, and GRPO-SR reinforcement learning procedure in a single decoder-only architecture.
If this is right
- Page-wise supervision yields denser gradient signals and avoids point-wise training ambiguities.
- The Token Merger reduces prompt length by roughly two times with little loss in accuracy.
- GRPO-SR improves policy alignment with user preferences while maintaining training stability.
- Live deployment produces higher click counts and transaction counts than the prior system.
Where Pith is reading between the lines
- This generative setup might eventually consolidate retrieval and ranking into one model pass for efficiency.
- Page-wise objectives could extend to other domains with paginated or batched sequential data.
- The hybrid reward design offers a template for preventing hacking in other reinforcement learning applications to recommendation.
- Ablation results would clarify which component contributes most to the gains.
Load-bearing premise
The lifts in click and transaction counts are caused by the new Page-wise NTP, Token Merger, and GRPO-SR components instead of other unstated production changes or chance.
What would settle it
An online experiment that adds or removes the proposed components one at a time, with all other factors held constant, and checks whether the performance improvements appear or disappear accordingly.
Figures
read the original abstract
Generative Retrieval (GR) offers a promising paradigm for recommendation through next-token prediction (NTP). However, scaling it to large-scale industrial systems introduces three challenges: (i) within a single request, the identical model inputs may produce inconsistent outputs due to the pagination request mechanism; (ii) the prohibitive cost of encoding long user behavior sequences with multi-token item representations based on semantic IDs, and (iii) aligning the generative policy with nuanced user preference signals. We present GenRec, a preference-oriented generative framework deployed on the JD App that addresses above challenges within a single decoder-only architecture. For training objective, we propose Page-wise NTP task, which supervises over an entire interaction page rather than each interacted item individually, providing denser gradient signal and resolving the one-to-many ambiguity of point-wise training. On the prefilling side, an asymmetric linear Token Merger compresses multi-token Semantic IDs in the prompt while preserving full-resolution decoding, reducing input length by ~2X with negligible accuracy loss. To further align outputs with user satisfaction, we introduce GRPO-SR, a reinforcement learning method that pairs Group Relative Policy Optimization with NLL regularization for training stability, and employs Hybrid Rewards combining a dense reward model with a relevance gate to mitigate reward hacking. In month-long online A/B tests serving production traffic, GenRec achieves 9.5% improvement in click count and 8.7% in transaction count over the existing pipeline.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces GenRec, a decoder-only generative retrieval framework for large-scale recommendation. It proposes Page-wise NTP to supervise entire interaction pages and resolve one-to-many ambiguity, an asymmetric linear Token Merger to compress multi-token semantic IDs in prompts by ~2X, and GRPO-SR (Group Relative Policy Optimization with NLL regularization and hybrid dense+relevance rewards) to align the policy with user satisfaction. The system is deployed on the JD App; the central empirical claim is that month-long online A/B tests on production traffic yield 9.5% higher click count and 8.7% higher transaction count versus the existing pipeline.
Significance. If the reported lifts are attributable to the three proposed components, the work would be significant for demonstrating a production-scale generative retrieval system that directly tackles pagination inconsistency, encoding cost, and preference alignment within a single architecture. The use of live A/B tests on real traffic supplies direct outcome evidence rather than relying solely on offline metrics, which is a clear strength.
major comments (1)
- [Online A/B tests description] The description of the online A/B tests (abstract and Experiments section) reports 9.5% and 8.7% lifts but supplies no traffic-split ratio, p-value or confidence interval, statement that the control arm was held fixed, or online ablation isolating Page-wise NTP, Token Merger, and GRPO-SR. Without these controls the observed deltas cannot be confidently attributed to the proposed methods rather than system drift or unmentioned changes.
minor comments (1)
- [Abstract] Abstract: 'addresses above challenges' should read 'addresses the above challenges'.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the reporting of our online A/B tests. We agree that additional experimental details are needed to strengthen attribution of the observed lifts and will incorporate them in the revision.
read point-by-point responses
-
Referee: [Online A/B tests description] The description of the online A/B tests (abstract and Experiments section) reports 9.5% and 8.7% lifts but supplies no traffic-split ratio, p-value or confidence interval, statement that the control arm was held fixed, or online ablation isolating Page-wise NTP, Token Merger, and GRPO-SR. Without these controls the observed deltas cannot be confidently attributed to the proposed methods rather than system drift or unmentioned changes.
Authors: We acknowledge that the current manuscript omits several key details required for rigorous interpretation of the production A/B results. In the revised version we will expand the Experiments section (and update the abstract if space permits) to report: (1) the traffic-split ratio (50/50), (2) p-values and 95% confidence intervals for the 9.5% click-count and 8.7% transaction-count lifts, (3) an explicit statement that the control arm remained unchanged for the full month-long test window, and (4) any component-wise online ablation results we can obtain or have already run. While full isolation of each proposed component (Page-wise NTP, Token Merger, GRPO-SR) in live traffic is operationally costly, we will include the strongest available ablation evidence and note any limitations. These additions will allow readers to more confidently attribute the gains to the proposed methods. revision: yes
Circularity Check
No circularity: claims rest on external A/B tests and novel components
full rationale
The paper motivates three engineering challenges in scaling generative retrieval, then introduces Page-wise NTP (supervising entire pages), asymmetric Token Merger (compressing semantic IDs), and GRPO-SR (RL with hybrid rewards) inside a decoder-only model. These are presented as design choices, not derived predictions. Validation comes from month-long production A/B tests measuring click and transaction lifts against the existing pipeline. No equations, fitted parameters, or self-citations are shown to reduce the central claims to inputs by construction. The performance numbers are measured externally and do not collapse into self-referential definitions or renamings of known results. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Next-token prediction on item sequences can capture user preference signals at scale
Forward citations
Cited by 1 Pith paper
-
Conditional Memory Enhanced Item Representation for Generative Recommendation
ComeIR introduces dual-level Engram memory and memory-restoring prediction to reconstruct SID-token embeddings and restore token granularity in generative recommendation.
Reference graph
Works this paper leans on
-
[1]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al . 2025. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923(2025). SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia Yanyan Zou et al
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
Runjin Chen, Mingxuan Ju, Ngoc Bui, Dimosthenis Antypas, Stanley Cai, Xi- aopeng Wu, Leonardo Neves, Zhangyang Wang, Neil Shah, and Tong Zhao
-
[3]
Enhancing Item Tokenization for Generative Recommendation through Self-Improvement. arXiv:2412.17171 [cs.LG] https://arxiv.org/abs/2412.17171
-
[4]
Sunhao Dai, Jiakai Tang, Jiahua Wu, Kun Wang, Yuxuan Zhu, Bingjun Chen, Bangyang Hong, Yu Zhao, Cong Fu, Kangle Wu, Yabo Ni, Anxiang Zeng, Wen- jie Wang, Xu Chen, Jun Xu, and See-Kiong Ng. 2025. OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System. arXiv:2509.18091 [cs.IR] https://arxiv.org/abs/2509.18091
-
[5]
Mukund Deshpande and George Karypis. 2004. Item-based top-n recommenda- tion algorithms.ACM Transactions on Information Systems (TOIS)22, 1 (2004), 143–177
2004
-
[6]
Ruidong Han, Bin Yin, Shangyu Chen, He Jiang, Fei Jiang, Xiang Li, Chi Ma, Mincong Huang, Xiaoguang Li, Chunzhen Jing, Yueming Han, MengLei Zhou, Lei Yu, Chuan Liu, and Wei Lin. 2025. MTGR: Industrial-Scale Generative Rec- ommendation Framework in Meituan. InProceedings of the 34th ACM Interna- tional Conference on Information and Knowledge Management (CI...
- [7]
-
[8]
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques.ACM Transactions on Information Systems (TOIS)20, 4 (2002), 422–446
2002
-
[9]
Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recom- mendation. In2018 IEEE international conference on data mining (ICDM). IEEE, 197–206
2018
-
[10]
Diederik P Kingma. 2014. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980(2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[11]
Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yuyao Zhang, Peitian Zhang, Yutao Zhu, and Zhicheng Dou. 2025. From matching to generation: A survey on generative information retrieval.ACM Transactions on Information Systems43, 3 (2025), 1–62
2025
-
[12]
Yang Li, Kangbo Liu, Ranjan Satapathy, Suhang Wang, and Erik Cambria. 2024. Recent Developments in Recommender Systems: A Survey [Review Article].IEEE Computational Intelligence Magazine19, 2 (2024), 78–95. doi:10.1109/MCI.2024. 3363984
-
[13]
Guanyu Lin, Zhigang Hua, Tao Feng, Shuang Yang, Bo Long, and Jiaxuan You
-
[14]
arXiv:2502.16474 [cs.IR] https://arxiv.org/abs/2502.16474
Unified Semantic and ID Representation Learning for Deep Recommenders. arXiv:2502.16474 [cs.IR] https://arxiv.org/abs/2502.16474
-
[15]
Ruihui Mu. 2018. A Survey of Recommender Systems Based on Deep Learning. IEEE Access6 (2018), 69009–69022. doi:10.1109/ACCESS.2018.2880197
-
[16]
Qi Pi, Guorui Zhou, Yujing Zhang, Zhe Wang, Lejian Ren, Ying Fan, Xiaoqiang Zhu, and Kun Gai. 2020. Search-based user interest modeling with lifelong sequential behavior data for click-through rate prediction. InProceedings of the 29th ACM International Conference on Information & Knowledge Management. 2685–2692
2020
-
[17]
Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[18]
Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al
-
[19]
Recommender systems with generative retrieval.Advances in Neural Information Processing Systems36 (2023), 10299–10315
2023
-
[20]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeek- Math: Pushing the Limits of Mathematical Reasoning in Open Language Models. arXiv:2402.03300 [cs.CL] https://arxiv.org/abs/2402.03300
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[21]
Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang
-
[22]
BERT4Rec: Sequential Recommendation with Bidirectional Encoder Repre- sentations from Transformers. InCIKM
- [23]
-
[24]
Chaojun Xiao, Jie Cai, Weilin Zhao, Biyuan Lin, Guoyang Zeng, Jie Zhou, Zhi Zheng, Xu Han, Zhiyuan Liu, and Maosong Sun. 2025. Densing law of llms. Nature Machine Intelligence(2025), 1–11
2025
- [25]
-
[26]
Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhao- jie Gong, Fangda Gu, Michael He, Yinghai Lu, and Yu Shi. 2024. Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations. arXiv:2402.17152 [cs.LG] https://arxiv.org/abs/2402.17152
work page internal anchor Pith review arXiv 2024
-
[27]
Jun Zhang, Yi Li, Yue Liu, Changping Wang, Yuan Wang, Yuling Xiong, Xun Liu, Haiyang Wu, Qian Li, Enming Zhang, Jiawei Sun, Xin Xu, Zishuai Zhang, Ruoran Liu, Suyuan Huang, Zhaoxin Zhang, Zhengkai Guo, Shuojin Yang, Meng-Hao Guo, Huan Yu, Jie Jiang, and Shi-Min Hu. 2025. GPR: Towards a Generative Pre-trained One-Model Paradigm for Large-Scale Advertising ...
- [28]
- [29]
-
[30]
Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, and Ji-Rong Wen. 2024. Adapting large language models by integrating collaborative semantics for recommendation. In2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE, 1435–1448
2024
-
[31]
Guorui Zhou, Jiaxin Deng, Jinghao Zhang, Kuo Cai, Lejian Ren, Qiang Luo, Qianqian Wang, Qigen Hu, Rui Huang, Shiyao Wang, Weifeng Ding, Wuchao Li, Xinchen Luo, Xingmei Wang, Zexuan Cheng, Zixing Zhang, Bin Zhang, Boxuan Wang, Chaoyi Ma, Chengru Song, Chenhui Wang, Di Wang, Dongxue Meng, Fan Yang, Fangyu Zhang, Feng Jiang, Fuxing Zhang, Gang Wang, Guowang ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.