GenRec: A Preference-Oriented Generative Framework for Large-Scale Recommendation

· 2026 · cs.IR · arXiv 2604.14878

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Generative Retrieval (GR) offers a promising paradigm for recommendation through next-token prediction (NTP). However, scaling it to large-scale industrial systems introduces three challenges: (i) within a single request, the identical model inputs may produce inconsistent outputs due to the pagination request mechanism; (ii) the prohibitive cost of encoding long user behavior sequences with multi-token item representations based on semantic IDs, and (iii) aligning the generative policy with nuanced user preference signals. We present GenRec, a preference-oriented generative framework deployed on the JD App that addresses above challenges within a single decoder-only architecture. For training objective, we propose Page-wise NTP task, which supervises over an entire interaction page rather than each interacted item individually, providing denser gradient signal and resolving the one-to-many ambiguity of point-wise training. On the prefilling side, an asymmetric linear Token Merger compresses multi-token Semantic IDs in the prompt while preserving full-resolution decoding, reducing input length by ~2X with negligible accuracy loss. To further align outputs with user satisfaction, we introduce GRPO-SR, a reinforcement learning method that pairs Group Relative Policy Optimization with NLL regularization for training stability, and employs Hybrid Rewards combining a dense reward model with a relevance gate to mitigate reward hacking. In month-long online A/B tests serving production traffic, GenRec achieves 9.5% improvement in click count and 8.7% in transaction count over the existing pipeline.

representative citing papers

Conditional Memory Enhanced Item Representation for Generative Recommendation

cs.IR · 2026-05-12 · unverdicted · novelty 6.0

ComeIR introduces dual-level Engram memory and memory-restoring prediction to reconstruct SID-token embeddings and restore token granularity in generative recommendation.

Adaptive Loss Balancing for Noise-Robust GRPO in Generative Recommendation

cs.LG · 2026-06-07 · unverdicted · novelty 5.0

AdaGRPO gates GRPO reinforcement learning with supervised NLL using per-sample binary clips based on policy difficulty and reward discriminability, raising HR@10 from 11.01% to 12.18% while keeping hallucination below 0.22% on large-scale e-commerce data and showing A/B gains.

citing papers explorer

Showing 2 of 2 citing papers.

Conditional Memory Enhanced Item Representation for Generative Recommendation cs.IR · 2026-05-12 · unverdicted · none · ref 57 · internal anchor
ComeIR introduces dual-level Engram memory and memory-restoring prediction to reconstruct SID-token embeddings and restore token granularity in generative recommendation.
Adaptive Loss Balancing for Noise-Robust GRPO in Generative Recommendation cs.LG · 2026-06-07 · unverdicted · none · ref 27 · internal anchor
AdaGRPO gates GRPO reinforcement learning with supervised NLL using per-sample binary clips based on policy difficulty and reward discriminability, raising HR@10 from 11.01% to 12.18% while keeping hallucination below 0.22% on large-scale e-commerce data and showing A/B gains.

GenRec: A Preference-Oriented Generative Framework for Large-Scale Recommendation

fields

years

verdicts

representative citing papers

citing papers explorer