Recognition: no theorem link
OneRec-V2 Technical Report
Pith reviewed 2026-05-16 22:53 UTC · model grok-4.3
The pith
Lazy decoder-only architecture cuts generative recommender computation by 94 percent and enables scaling to 8 billion parameters
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OneRec-V2 introduces a Lazy Decoder-Only Architecture that eliminates the encoder and its associated bottlenecks, achieving a 94 percent reduction in total computation and 90 percent in training resources to successfully scale the model to 8 billion parameters. It also incorporates Preference Alignment with Real-World User Interactions through Duration-Aware Reward Shaping and Adaptive Ratio Clipping, which produces measurable gains in multi-objective metrics during live A/B testing.
What carries the argument
Lazy Decoder-Only Architecture that shifts all computation to autoregressive decoding without a separate encoder stage
If this is right
- Generative recommender models become practical to train at parameter counts previously blocked by encoder overhead.
- Direct use of real user duration signals replaces reliance on separate reward models for alignment.
- Multi-objective balancing improves alongside single metrics such as stay time.
- Training and inference resource demands fall sharply while quality metrics hold or rise.
Where Pith is reading between the lines
- The same lazy-decoding pattern could be tested in other autoregressive generation settings such as search ranking or content creation.
- Lower training costs might enable more frequent full-model refreshes in production environments.
- Whether the duration-aware shaping generalizes depends on whether session-length preferences remain consistent across different platforms and user populations.
Load-bearing premise
That removing the encoder and relying on lazy decoding still preserves enough context to maintain recommendation quality.
What would settle it
An A/B test on the live platform showing no improvement or a decline in app stay time after deploying the lazy decoder-only model at 8 billion parameters.
read the original abstract
Recent breakthroughs in generative AI have transformed recommender systems through end-to-end generation. OneRec reformulates recommendation as an autoregressive generation task, achieving high Model FLOPs Utilization. While OneRec-V1 has shown significant empirical success in real-world deployment, two critical challenges hinder its scalability and performance: (1) inefficient computational allocation where 97.66% of resources are consumed by sequence encoding rather than generation, and (2) limitations in reinforcement learning relying solely on reward models. To address these challenges, we propose OneRec-V2, featuring: (1) Lazy Decoder-Only Architecture: Eliminates encoder bottlenecks, reducing total computation by 94% and training resources by 90%, enabling successful scaling to 8B parameters. (2) Preference Alignment with Real-World User Interactions: Incorporates Duration-Aware Reward Shaping and Adaptive Ratio Clipping to better align with user preferences using real-world feedback. Extensive A/B tests on Kuaishou demonstrate OneRec-V2's effectiveness, improving App Stay Time by 0.467%/0.741% while balancing multi-objective recommendations. This work advances generative recommendation scalability and alignment with real-world feedback, representing a step forward in the development of end-to-end recommender systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces OneRec-V2 as an evolution of OneRec-V1 for generative recommendation. It proposes a Lazy Decoder-Only Architecture that removes the encoder to cut total computation by 94% and training resources by 90%, enabling scaling to 8B parameters, and adds Preference Alignment via Duration-Aware Reward Shaping plus Adaptive Ratio Clipping to incorporate real-world user feedback. A/B tests on Kuaishou report App Stay Time lifts of 0.467%/0.741% while balancing multi-objective metrics.
Significance. If the efficiency claims and quality preservation hold, the work offers a practical route to scaling generative recommenders with substantially lower resource costs and improved real-world alignment. The concrete A/B-test percentage lifts constitute a strength, providing direct evidence of deployment impact rather than purely synthetic metrics.
major comments (2)
- [Abstract] Abstract: the central efficiency claim (94% total computation reduction, 90% training-resource savings) rests on the stated 97.66% encoding overhead, yet the abstract supplies no baseline model size, measurement protocol, or confidence intervals, rendering the headline numbers unverifiable from the provided text.
- [Lazy Decoder-Only Architecture] Lazy Decoder-Only Architecture section: the claim that removing the encoder preserves recommendation quality at 8B scale is load-bearing for the scalability argument, but no offline ablations, CTR/quality metrics, or direct comparisons to OneRec-V1 are referenced to confirm absence of degradation.
minor comments (2)
- [Abstract] Abstract: specify the exact input-construction mechanism that replaces the encoder so readers can assess whether the lazy decoding truly maintains equivalent information flow.
- [Experiments] A/B test reporting: include the exact baseline system, test duration, and statistical significance for the 0.467%/0.741% App Stay Time lifts to strengthen the empirical claims.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help improve the clarity and verifiability of the efficiency claims and the supporting evidence for quality preservation. We address each major point below and have made corresponding revisions to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central efficiency claim (94% total computation reduction, 90% training-resource savings) rests on the stated 97.66% encoding overhead, yet the abstract supplies no baseline model size, measurement protocol, or confidence intervals, rendering the headline numbers unverifiable from the provided text.
Authors: We agree that the abstract would benefit from additional context to make the headline numbers immediately verifiable. The baseline is OneRec-V1 (approximately 1B parameters), and the 97.66% encoding overhead is the measured proportion of total FLOPs consumed by the encoder component versus the decoder in the original architecture, computed via standard transformer FLOPs formulas on representative sequence lengths of 512. The 94% and 90% reductions follow directly from eliminating the encoder. We have revised the abstract to briefly state the baseline and point readers to the detailed protocol and breakdown in Section 3.1. These figures are deterministic computational measurements rather than statistical estimates, so confidence intervals do not apply. revision: yes
-
Referee: [Lazy Decoder-Only Architecture] Lazy Decoder-Only Architecture section: the claim that removing the encoder preserves recommendation quality at 8B scale is load-bearing for the scalability argument, but no offline ablations, CTR/quality metrics, or direct comparisons to OneRec-V1 are referenced to confirm absence of degradation.
Authors: We thank the referee for noting this gap in referencing. While the section focuses on architectural efficiency, the full manuscript reports the relevant offline evidence in Section 4.2 (Experiments), including direct comparisons of CTR, NDCG, and other quality metrics between OneRec-V2 at 8B parameters and OneRec-V1, confirming no degradation (and in some cases improvement). We have revised the Lazy Decoder-Only Architecture section to include explicit cross-references to these ablations and results, strengthening the self-contained nature of the scalability argument. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper is an empirical technical report describing the Lazy Decoder-Only Architecture and preference alignment methods, with claims backed by A/B test results on Kuaishou (e.g., App Stay Time improvements) and reported compute reductions. No mathematical derivation chain, equations, or load-bearing steps are present that reduce by construction to fitted inputs, self-citations, or ansatzes. The work relies on real-world deployment measurements rather than self-referential predictions or uniqueness theorems, making it self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Autoregressive generation is a suitable reformulation of recommendation that preserves ranking quality when encoder bottlenecks are removed.
Forward citations
Cited by 18 Pith papers
-
Asymmetric Generative Recommendation via Multi-Expert Projection and Multi-Faceted Hierarchical Quantization
AsymRec decouples input and output representations in generative recommendation via multi-expert semantic projection and multi-faceted hierarchical quantization, outperforming prior models by 15.8% on average.
-
Why Users Go There: World Knowledge-Augmented Generative Next POI Recommendation
AWARE augments generative next-POI recommendation with LLM agents that produce user-anchored narratives capturing events, culture, and trends, delivering up to 12.4% relative gains on three real datasets.
-
TokenFormer: Unify the Multi-Field and Sequential Recommendation Worlds
TokenFormer unifies multi-field and sequential recommendation modeling via bottom-full-top-sliding attention and non-linear interaction representations to avoid sequential collapse and deliver state-of-the-art performance.
-
Tencent Advertising Algorithm Challenge 2025: All-Modality Generative Recommendation
Releases TencentGR-1M and TencentGR-10M datasets with baselines for all-modality generative recommendation in advertising, including weighted evaluation for conversions.
-
Conditional Memory Enhanced Item Representation for Generative Recommendation
ComeIR introduces dual-level Engram memory and memory-restoring prediction to reconstruct SID-token embeddings and restore token granularity in generative recommendation.
-
UxSID: Semantic-Aware User Interests Modeling for Ultra-Long Sequence
UxSID uses Semantic IDs and dual-level attention for semantic-group shared interest memory to efficiently model ultra-long user sequences, claiming SOTA performance and 0.337% revenue lift in advertising A/B tests.
-
One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving
HELM adaptively partitions HBM between EMB and KV caches via a three-layer PPO controller and EMB-KV-aware scheduling, reducing P99 latency by 24-38% while achieving 93.5-99.6% SLO satisfaction on production workloads.
-
From Local Indices to Global Identifiers: Generative Reranking for Recommender Systems via Global Action Space
GloRank reformulates list-wise reranking as token generation over a global item identifier space, using supervised pre-training followed by reinforcement learning to maximize list-wise utility and outperforming baseli...
-
Birds of a Feather Cluster Nearby: a Proximity-Aware Geo-Codebook for Local Service Recommendation
Pro-GEO introduces a geo-centroid coordinate system and geo-rotary position encoding to model geographic proximity as rotational transformations, enabling balanced semantic-spatial modeling in local service recommendations.
-
MTServe: Efficient Serving for Generative Recommendation Models with Hierarchical Caches
MTServe achieves up to 3.1x speedup for generative recommendation model serving by using hierarchical caches with host RAM and system optimizations while keeping cache hit ratios above 98.5%.
-
UniRec: Bridging the Expressive Gap between Generative and Discriminative Recommendation via Chain-of-Attribute
UniRec bridges the expressive gap in generative recommendation by prefixing semantic ID sequences with structured attribute tokens, recovering explicit feature crossing and yielding +22.6% HR@50 gains plus online lift...
-
Deep Interest Mining with Cross-Modal Alignment for SemanticID Generation in Generative Recommendation
A new framework integrating deep interest mining, cross-modal semantic alignment, and quality-aware reinforcement learning generates higher-quality Semantic IDs and outperforms prior methods on recommendation benchmarks.
-
Frozen LVLMs for Micro-Video Recommendation: A Systematic Study of Feature Extraction and Fusion
Intermediate decoder hidden states from frozen LVLMs fused with ID embeddings outperform caption representations and deliver state-of-the-art micro-video recommendation performance on two real-world benchmarks.
-
UxSID: Semantic-Aware User Interests Modeling for Ultra-Long Sequence
UxSID introduces semantic-group shared interest memory with Semantic IDs and dual-level attention to model ultra-long user sequences, claiming state-of-the-art results and a 0.337% revenue lift in advertising A/B tests.
-
Unified Value Alignment for Generative Recommendation in Industrial Advertising
UniVA unifies value alignment in generative recommendation via a Commercial SID tokenizer, eCPM-aware RL decoder, and personalized beam search, reporting 37% offline Hit Rate gains and 1.5% online GMV lift on Tencent ...
-
TriAlignGR: Triangular Multitask Alignment with Multimodal Deep Interest Mining for Generative Recommendation
TriAlignGR integrates visual content and latent user interests into Semantic IDs via cross-modal alignment, CoT-based interest mining, and triangular multitask training to address content degradation and semantic opac...
-
Mitigating Collaborative Semantic ID Staleness in Generative Retrieval
A model-agnostic SID alignment update mitigates staleness from temporal drift in user-item interactions for generative retrievers, improving Recall@K and nDCG@K while reducing compute by 8-9x versus full retraining.
-
OneSearch-V2: The Latent Reasoning Enhanced Self-distillation Generative Search Framework
OneSearch-V2 improves generative retrieval via latent reasoning and self-distillation, achieving +3.98% item CTR, +2.07% buyer volume, and +2.11% order volume in online A/B tests.
Reference graph
Works this paper leans on
-
[1]
J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
J. Ainslie, J. Lee-Thorp, M. De Jong, Y. Zemlyanskiy, F. Lebrón, and S. Sanghai. Gqa: Training general- ized multi-query transformer models from multi-head checkpoints.arXiv preprint arXiv:2305.13245,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
A. Badrinath, P. Agarwal, L. Bhasin, J. Yang, J. Xu, and C. Rosenberg. Pinrec: Outcome-conditioned, multi-token generative retrieval for industry-scale recommendation systems.arXiv preprint arXiv:2504.10507,
- [4]
-
[5]
A. Chen, A. Li, B. Gong, B. Jiang, B. Fei, B. Yang, B. Shan, C. Yu, C. Wang, C. Zhu, et al. Minimax-m1: Scaling test-time compute efficiently with lightning attention.arXiv preprint arXiv:2506.13585,
work page internal anchor Pith review Pith/arXiv arXiv
- [6]
- [7]
-
[8]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186,
work page 2019
-
[9]
A. Gangwar and S. Jain. An adaptive boosting technique to mitigate popularity bias in recommender system.arXiv preprint arXiv:2109.05677,
-
[10]
D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948,
work page internal anchor Pith review Pith/arXiv arXiv
- [11]
-
[12]
J. He, J. Liu, C. Y. Liu, R. Yan, C. Wang, P. Cheng, X. Zhang, F. Zhang, J. Xu, W. Shen, et al. Skywork open reasoner 1 technical report.arXiv preprint arXiv:2505.22312,
work page internal anchor Pith review arXiv
-
[13]
Training Compute-Optimal Large Language Models
J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. d. L. Casas, L. A. Hendricks, J. Welbl, A. Clark, et al. Training compute-optimal large language models.arXiv preprint arXiv:2203.15556,
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
Scaling Laws for Neural Language Models
J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361,
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[15]
A. Klimashevskaia, D. Jannach, M. Elahi, and C. Trattner. A survey on popularity bias in recommender systems (2023).CoRR, abs/2308.01118. L. Kong, L. Wang, C. Peng, Z. Lin, C. Law, and J. Shao. Generative click-through rate prediction with applications to search advertising.arXiv preprint arXiv:2507.11246,
-
[16]
A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437,
work page internal anchor Pith review Pith/arXiv arXiv
-
[17]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347,
work page internal anchor Pith review Pith/arXiv arXiv
-
[18]
Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y. Li, Y. Wu, et al. Deepseek- math: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300,
work page internal anchor Pith review Pith/arXiv arXiv
- [19]
-
[20]
LLaMA: Open and Efficient Foundation Language Models
H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023a. H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhar- gava, S. Bhosale, et al. Llama 2: Ope...
work page internal anchor Pith review Pith/arXiv arXiv
- [21]
-
[22]
23 OneRec-V2 Technical Report A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025a. Y. Yang, Z. Ji, Z. Li, Y. Li, Z. Mo, Y. Ding, K. Chen, Z. Zhang, J. Li, S. Li, et al. Sparse meets dense: Unified generative recommendations with cascaded sparse-dense repr...
work page internal anchor Pith review Pith/arXiv arXiv
-
[23]
Q. Yu, Z. Zhang, R. Zhu, Y. Yuan, X. Zuo, Y. Yue, W. Dai, T. Fan, G. Liu, L. Liu, et al. Dapo: An open-source llm reinforcement learning system at scale.arXiv preprint arXiv:2503.14476,
work page internal anchor Pith review Pith/arXiv arXiv
-
[24]
J. Zhai, L. Liao, X. Liu, Y. Wang, R. Li, X. Cao, L. Gao, Z. Gong, F. Gu, M. He, et al. Actions speak louder than words: Trillion-parameter sequential transducers for generative recommendations. arXiv preprint arXiv:2402.17152,
work page internal anchor Pith review Pith/arXiv arXiv
- [25]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.