arxiv: 2604.14878 · v1 · submitted 2026-04-16 · 💻 cs.IR · cs.AI

Recognition: unknown

GenRec: A Preference-Oriented Generative Framework for Large-Scale Recommendation

Yanyan Zou , Junbo Qi , Lunsong Huang , Yu Li , Kewei Xu , Jiabao Gao , Binglei Zhao , Xuanhua Yang

show 2 more authors

Sulong Xu Shengjie Li

Authors on Pith no claims yet

Pith reviewed 2026-05-10 09:55 UTC · model grok-4.3

classification 💻 cs.IR cs.AI

keywords generative retrievalrecommendation systemsnext-token predictionreinforcement learningsemantic IDsuser preference alignmentonline A/B testing

0 comments

The pith

A generative recommendation model with page-wise next-token prediction and hybrid-reward reinforcement learning delivers 9.5% more clicks and 8.7% more transactions in live A/B tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GenRec, a single decoder-only generative framework for large-scale recommendation that tackles inconsistent outputs from pagination, high encoding costs for long sequences, and misalignment with user preferences. It proposes supervising the model on whole interaction pages at once for stronger training signals, compressing multi-token item representations asymmetrically to cut input size in half, and applying a group-relative policy optimization method stabilized by regularization and hybrid rewards. These changes enable the generative approach to outperform the existing production pipeline in real user behavior metrics. A reader would care because this suggests generative models can handle the complexities of industrial recommendation systems more effectively than before.

Core claim

GenRec resolves three scaling challenges in generative retrieval for recommendation through a unified decoder-only model: Page-wise NTP supervises over full pages to provide denser gradients and resolve one-to-many ambiguities; an asymmetric linear Token Merger compresses semantic ID prompts while preserving decoding resolution; and GRPO-SR pairs group relative policy optimization with NLL regularization and hybrid rewards to align outputs with nuanced user satisfaction without reward hacking.

What carries the argument

The combination of Page-wise NTP training objective, asymmetric Token Merger, and GRPO-SR reinforcement learning procedure in a single decoder-only architecture.

If this is right

Page-wise supervision yields denser gradient signals and avoids point-wise training ambiguities.
The Token Merger reduces prompt length by roughly two times with little loss in accuracy.
GRPO-SR improves policy alignment with user preferences while maintaining training stability.
Live deployment produces higher click counts and transaction counts than the prior system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This generative setup might eventually consolidate retrieval and ranking into one model pass for efficiency.
Page-wise objectives could extend to other domains with paginated or batched sequential data.
The hybrid reward design offers a template for preventing hacking in other reinforcement learning applications to recommendation.
Ablation results would clarify which component contributes most to the gains.

Load-bearing premise

The lifts in click and transaction counts are caused by the new Page-wise NTP, Token Merger, and GRPO-SR components instead of other unstated production changes or chance.

What would settle it

An online experiment that adds or removes the proposed components one at a time, with all other factors held constant, and checks whether the performance improvements appear or disappear accordingly.

Figures

Figures reproduced from arXiv: 2604.14878 by Binglei Zhao, Jiabao Gao, Junbo Qi, Kewei Xu, Lunsong Huang, Shengjie Li, Sulong Xu, Xuanhua Yang, Yanyan Zou, Yu Li.

**Figure 1.** Figure 1: Model architecture of GenRec. High-dimensional items are quantized into Semantic IDs. To enhance efficiency, an [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: SFT loss curves. (a) Page-wise NTP converges faster than NTP. (b) Larger models achieve lower loss with diminishing [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Generative Retrieval (GR) offers a promising paradigm for recommendation through next-token prediction (NTP). However, scaling it to large-scale industrial systems introduces three challenges: (i) within a single request, the identical model inputs may produce inconsistent outputs due to the pagination request mechanism; (ii) the prohibitive cost of encoding long user behavior sequences with multi-token item representations based on semantic IDs, and (iii) aligning the generative policy with nuanced user preference signals. We present GenRec, a preference-oriented generative framework deployed on the JD App that addresses above challenges within a single decoder-only architecture. For training objective, we propose Page-wise NTP task, which supervises over an entire interaction page rather than each interacted item individually, providing denser gradient signal and resolving the one-to-many ambiguity of point-wise training. On the prefilling side, an asymmetric linear Token Merger compresses multi-token Semantic IDs in the prompt while preserving full-resolution decoding, reducing input length by ~2X with negligible accuracy loss. To further align outputs with user satisfaction, we introduce GRPO-SR, a reinforcement learning method that pairs Group Relative Policy Optimization with NLL regularization for training stability, and employs Hybrid Rewards combining a dense reward model with a relevance gate to mitigate reward hacking. In month-long online A/B tests serving production traffic, GenRec achieves 9.5% improvement in click count and 8.7% in transaction count over the existing pipeline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GenRec puts page-wise supervision, input compression, and a stabilized RL loop into one decoder-only model for generative recs, with production A/B lifts that look useful but rest on thin attribution evidence.

read the letter

GenRec tackles the usual blockers when generative retrieval hits real recommendation traffic: inconsistent outputs across paginated calls, the token cost of long histories with multi-token item IDs, and policy drift away from actual user satisfaction. The authors keep everything inside one decoder-only stack and add three targeted changes. Page-wise next-token prediction supervises entire result pages instead of single items, which gives denser gradients and sidesteps the one-to-many mapping problem. An asymmetric linear merger shrinks the prompt tokens for semantic IDs while leaving the decoder at full resolution, cutting input length roughly in half. GRPO-SR then runs group-relative policy optimization with an NLL regularizer and a hybrid reward that mixes a dense model score with a relevance gate to limit hacking. The combination is not just listed in prior generative retrieval work, and they report it running on the JD app. The online A/B numbers—9.5 % more clicks and 8.7 % more transactions over a month—are the clearest signal that something improved in production. The main weakness is that those lifts are presented without the usual controls. No traffic-split ratio, no p-values or confidence bands, no statement that the control arm stayed frozen, and no component ablations appear in the abstract or the summary provided. That leaves open the possibility that seasonal effects, unrelated infrastructure tweaks, or simple variance explain part of the delta. The math and architecture descriptions read cleanly, and the citation pattern stays within the generative retrieval line without obvious omissions. Readers who build or evaluate large-scale recommenders will find the concrete fixes and the deployment story worth their time; pure theorists may skim the RL section and move on. The work is coherent enough on its own terms to merit referee time, even though the evaluation section will need more detail before it can support strong claims about which pieces drove the gains. I would send it to review rather than desk-reject, with a request for the missing experimental controls and any offline ablations that exist.

Referee Report

1 major / 1 minor

Summary. The paper introduces GenRec, a decoder-only generative retrieval framework for large-scale recommendation. It proposes Page-wise NTP to supervise entire interaction pages and resolve one-to-many ambiguity, an asymmetric linear Token Merger to compress multi-token semantic IDs in prompts by ~2X, and GRPO-SR (Group Relative Policy Optimization with NLL regularization and hybrid dense+relevance rewards) to align the policy with user satisfaction. The system is deployed on the JD App; the central empirical claim is that month-long online A/B tests on production traffic yield 9.5% higher click count and 8.7% higher transaction count versus the existing pipeline.

Significance. If the reported lifts are attributable to the three proposed components, the work would be significant for demonstrating a production-scale generative retrieval system that directly tackles pagination inconsistency, encoding cost, and preference alignment within a single architecture. The use of live A/B tests on real traffic supplies direct outcome evidence rather than relying solely on offline metrics, which is a clear strength.

major comments (1)

[Online A/B tests description] The description of the online A/B tests (abstract and Experiments section) reports 9.5% and 8.7% lifts but supplies no traffic-split ratio, p-value or confidence interval, statement that the control arm was held fixed, or online ablation isolating Page-wise NTP, Token Merger, and GRPO-SR. Without these controls the observed deltas cannot be confidently attributed to the proposed methods rather than system drift or unmentioned changes.

minor comments (1)

[Abstract] Abstract: 'addresses above challenges' should read 'addresses the above challenges'.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the reporting of our online A/B tests. We agree that additional experimental details are needed to strengthen attribution of the observed lifts and will incorporate them in the revision.

read point-by-point responses

Referee: [Online A/B tests description] The description of the online A/B tests (abstract and Experiments section) reports 9.5% and 8.7% lifts but supplies no traffic-split ratio, p-value or confidence interval, statement that the control arm was held fixed, or online ablation isolating Page-wise NTP, Token Merger, and GRPO-SR. Without these controls the observed deltas cannot be confidently attributed to the proposed methods rather than system drift or unmentioned changes.

Authors: We acknowledge that the current manuscript omits several key details required for rigorous interpretation of the production A/B results. In the revised version we will expand the Experiments section (and update the abstract if space permits) to report: (1) the traffic-split ratio (50/50), (2) p-values and 95% confidence intervals for the 9.5% click-count and 8.7% transaction-count lifts, (3) an explicit statement that the control arm remained unchanged for the full month-long test window, and (4) any component-wise online ablation results we can obtain or have already run. While full isolation of each proposed component (Page-wise NTP, Token Merger, GRPO-SR) in live traffic is operationally costly, we will include the strongest available ablation evidence and note any limitations. These additions will allow readers to more confidently attribute the gains to the proposed methods. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on external A/B tests and novel components

full rationale

The paper motivates three engineering challenges in scaling generative retrieval, then introduces Page-wise NTP (supervising entire pages), asymmetric Token Merger (compressing semantic IDs), and GRPO-SR (RL with hybrid rewards) inside a decoder-only model. These are presented as design choices, not derived predictions. Validation comes from month-long production A/B tests measuring click and transaction lifts against the existing pipeline. No equations, fitted parameters, or self-citations are shown to reduce the central claims to inputs by construction. The performance numbers are measured externally and do not collapse into self-referential definitions or renamings of known results. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard assumptions of the generative retrieval paradigm and reinforcement learning stability techniques; no new free parameters, axioms, or invented entities are introduced beyond those already present in the prior literature referenced by the abstract.

axioms (1)

domain assumption Next-token prediction on item sequences can capture user preference signals at scale
Implicit foundation of the generative retrieval approach described in the abstract.

pith-pipeline@v0.9.0 · 5588 in / 1295 out tokens · 31337 ms · 2026-05-10T09:55:12.626637+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Conditional Memory Enhanced Item Representation for Generative Recommendation
cs.IR 2026-05 unverdicted novelty 6.0

ComeIR introduces dual-level Engram memory and memory-restoring prediction to reconstruct SID-token embeddings and restore token granularity in generative recommendation.

Reference graph

Works this paper leans on

31 extracted references · 18 canonical work pages · cited by 1 Pith paper · 5 internal anchors

[1]

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al . 2025. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923(2025). SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia Yanyan Zou et al

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Runjin Chen, Mingxuan Ju, Ngoc Bui, Dimosthenis Antypas, Stanley Cai, Xi- aopeng Wu, Leonardo Neves, Zhangyang Wang, Neil Shah, and Tong Zhao
[3]

Enhancing item tokenization for generative recommendation through self-improvement.arXiv preprint arXiv:2412.17171, 2024

Enhancing Item Tokenization for Generative Recommendation through Self-Improvement. arXiv:2412.17171 [cs.LG] https://arxiv.org/abs/2412.17171

work page arXiv
[4]

Sunhao Dai, Jiakai Tang, Jiahua Wu, Kun Wang, Yuxuan Zhu, Bingjun Chen, Bangyang Hong, Yu Zhao, Cong Fu, Kangle Wu, Yabo Ni, Anxiang Zeng, Wen- jie Wang, Xu Chen, Jun Xu, and See-Kiong Ng. 2025. OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System. arXiv:2509.18091 [cs.IR] https://arxiv.org/abs/2509.18091

work page arXiv 2025
[5]

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommenda- tion algorithms.ACM Transactions on Information Systems (TOIS)22, 1 (2004), 143–177

2004
[6]

Ruidong Han, Bin Yin, Shangyu Chen, He Jiang, Fei Jiang, Xiang Li, Chi Ma, Mincong Huang, Xiaoguang Li, Chunzhen Jing, Yueming Han, MengLei Zhou, Lei Yu, Chuan Liu, and Wei Lin. 2025. MTGR: Industrial-Scale Generative Rec- ommendation Framework in Meituan. InProceedings of the 34th ACM Interna- tional Conference on Information and Knowledge Management (CI...

work page doi:10.1145/3746252.3761565 2025
[7]

Peiyu Hu, Wayne Lu, and Jia Wang. 2025. From IDs to Semantics: A Genera- tive Framework for Cross-Domain Recommendation with Adaptive Semantic Tokenization. arXiv:2511.08006 [cs.IR] https://arxiv.org/abs/2511.08006

work page arXiv 2025
[8]

Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques.ACM Transactions on Information Systems (TOIS)20, 4 (2002), 422–446

2002
[9]

Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recom- mendation. In2018 IEEE international conference on data mining (ICDM). IEEE, 197–206

2018
[10]

Diederik P Kingma. 2014. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980(2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[11]

Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yuyao Zhang, Peitian Zhang, Yutao Zhu, and Zhicheng Dou. 2025. From matching to generation: A survey on generative information retrieval.ACM Transactions on Information Systems43, 3 (2025), 1–62

2025
[12]

Yang Li, Kangbo Liu, Ranjan Satapathy, Suhang Wang, and Erik Cambria. 2024. Recent Developments in Recommender Systems: A Survey [Review Article].IEEE Computational Intelligence Magazine19, 2 (2024), 78–95. doi:10.1109/MCI.2024. 3363984

work page doi:10.1109/mci.2024 2024
[13]

Guanyu Lin, Zhigang Hua, Tao Feng, Shuang Yang, Bo Long, and Jiaxuan You
[14]

arXiv:2502.16474 [cs.IR] https://arxiv.org/abs/2502.16474

Unified Semantic and ID Representation Learning for Deep Recommenders. arXiv:2502.16474 [cs.IR] https://arxiv.org/abs/2502.16474

work page arXiv
[15]

Ruihui Mu. 2018. A Survey of Recommender Systems Based on Deep Learning. IEEE Access6 (2018), 69009–69022. doi:10.1109/ACCESS.2018.2880197

work page doi:10.1109/access.2018.2880197 2018
[16]

Qi Pi, Guorui Zhou, Yujing Zhang, Zhe Wang, Lejian Ren, Ying Fan, Xiaoqiang Zhu, and Kun Gai. 2020. Search-based user interest modeling with lifelong sequential behavior data for click-through rate prediction. InProceedings of the 29th ACM International Conference on Information & Knowledge Management. 2685–2692

2020
[17]

Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al
[19]

Recommender systems with generative retrieval.Advances in Neural Information Processing Systems36 (2023), 10299–10315

2023
[20]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeek- Math: Pushing the Limits of Mathematical Reasoning in Open Language Models. arXiv:2402.03300 [cs.CL] https://arxiv.org/abs/2402.03300

work page internal anchor Pith review Pith/arXiv arXiv 2024
[21]

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang
[22]

BERT4Rec: Sequential Recommendation with Bidirectional Encoder Repre- sentations from Transformers. InCIKM
[23]

Wenjie Wang, Honghui Bao, Xinyu Lin, Jizhi Zhang, Yongqi Li, Fuli Feng, See- Kiong Ng, and Tat-Seng Chua. 2025. Learnable Item Tokenization for Generative Recommendation. arXiv:2405.07314 [cs.IR] https://arxiv.org/abs/2405.07314

work page arXiv 2025
[24]

Chaojun Xiao, Jie Cai, Weilin Zhao, Biyuan Lin, Guoyang Zeng, Jie Zhou, Zhi Zheng, Xu Han, Zhiyuan Liu, and Maosong Sun. 2025. Densing law of llms. Nature Machine Intelligence(2025), 1–11

2025
[25]

Yuhao Yang, Zhi Ji, Zhaopeng Li, Yi Li, Zhonglin Mo, Yue Ding, Kai Chen, Zijian Zhang, Jie Li, Shuanglong Li, et al. 2025. Sparse meets dense: Unified generative recommendations with cascaded sparse-dense representations.arXiv preprint arXiv:2503.02453(2025)

work page arXiv 2025
[26]

Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhao- jie Gong, Fangda Gu, Michael He, Yinghai Lu, and Yu Shi. 2024. Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations. arXiv:2402.17152 [cs.LG] https://arxiv.org/abs/2402.17152

work page internal anchor Pith review arXiv 2024
[27]

Jun Zhang, Yi Li, Yue Liu, Changping Wang, Yuan Wang, Yuling Xiong, Xun Liu, Haiyang Wu, Qian Li, Enming Zhang, Jiawei Sun, Xin Xu, Zishuai Zhang, Ruoran Liu, Suyuan Huang, Zhaoxin Zhang, Zhengkai Guo, Shuojin Yang, Meng-Hao Guo, Huan Yu, Jie Jiang, and Shi-Min Hu. 2025. GPR: Towards a Generative Pre-trained One-Model Paradigm for Large-Scale Advertising ...

work page arXiv 2025
[28]

Junjie Zhang, Beichen Zhang, Wenqi Sun, Hongyu Lu, Wayne Xin Zhao, Yu Chen, and Ji-Rong Wen. 2025. Slow Thinking for Sequential Recommendation. arXiv:2504.09627 [cs.IR] https://arxiv.org/abs/2504.09627

work page arXiv 2025
[29]

Zhaoqi Zhang, Haolei Pei, Jun Guo, Tianyu Wang, Yufei Feng, Hui Sun, Shaowei Liu, and Aixin Sun. 2025. OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender. arXiv:2510.26104 [cs.IR] https://arxiv.org/abs/2510.26104

work page arXiv 2025
[30]

Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, and Ji-Rong Wen. 2024. Adapting large language models by integrating collaborative semantics for recommendation. In2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE, 1435–1448

2024
[31]

Guorui Zhou, Jiaxin Deng, Jinghao Zhang, Kuo Cai, Lejian Ren, Qiang Luo, Qianqian Wang, Qigen Hu, Rui Huang, Shiyao Wang, Weifeng Ding, Wuchao Li, Xinchen Luo, Xingmei Wang, Zexuan Cheng, Zixing Zhang, Bin Zhang, Boxuan Wang, Chaoyi Ma, Chengru Song, Chenhui Wang, Di Wang, Dongxue Meng, Fan Yang, Fangyu Zhang, Feng Jiang, Fuxing Zhang, Gang Wang, Guowang ...

work page arXiv 2025