Towards Generalizable and Efficient Large-Scale Generative Recommenders

Ko-Jen Hsiao; Moumita Bhattacharya; Qiuling Xu

arxiv: 2605.23312 · v1 · pith:NOVBJYHCnew · submitted 2026-05-22 · 💻 cs.IR

Towards Generalizable and Efficient Large-Scale Generative Recommenders

Qiuling Xu , Ko-Jen Hsiao , Moumita Bhattacharya This is my paper

Pith reviewed 2026-05-25 03:49 UTC · model grok-4.3

classification 💻 cs.IR

keywords generative recommendationmodel scalingproduction deploymentcold-start handlingsequence modelingrecommendation efficiencyscaling laws

0 comments

The pith

Scaling a generative recommender backbone from 2M to 1B parameters raises MRR over the smaller baseline in production tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines scaling a generative recommendation model from 2M to 1B backbone parameters within a production title recommendation system. It observes that scaling gains depend on the task, with some saturating quickly while others keep improving. To handle real-world constraints such as frequent retraining over trillions of tokens, serving latency, and new-item cold starts, the work adds multi-token prediction, sampled softmax plus a projected decoding head, and semantic item towers that mask collaborative embeddings. A one-week shadow evaluation on 1M users finds the 1B model ahead on every reported task. The results frame model scale as one element within a larger production transfer problem that also covers task headroom, decoding cost, and item generalization.

Core claim

In a production-scale title recommendation setting, a generative recommender with a 1B-parameter backbone, diagnosed via offset scaling-law fits and equipped with multi-token prediction, sampled softmax with projected decoding, and semantic item towers using collaborative-embedding masking, achieves higher mean reciprocal rank than the 2M-parameter baseline across all tasks in a one-week production-shadow evaluation over 1M users.

What carries the argument

Offset scaling-law fits to diagnose task-dependent scaling, paired with multi-token prediction for serving-latency alignment, sampled softmax and projected decoding head for repeated-training efficiency, and semantic item towers with collaborative-embedding masking for cold-start adaptation.

If this is right

Some tasks approach an empirical performance ceiling, so further scale adds little value for them.
The efficiency adaptations allow repeated training over trillions of behavior tokens at acceptable cost.
Semantic metadata enables scoring of newly launched titles before reliable collaborative embeddings exist.
Model scale must be weighed against task headroom, decoding cost, serving-latency alignment, and item generalization when deploying generative recommenders.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Further increases beyond 1B parameters would likely demand additional efficiency techniques to stay practical under production retraining loads.
The same combination of scale diagnostics and adaptation methods could be tested on other sequence modeling tasks such as session-based or long-term user journey prediction.
Saturation points may shift in domains with different item turnover rates or user behavior distributions.
Applying these adaptations could narrow the gap between pre-training improvements and realized downstream gains in other large-scale recommender systems.

Load-bearing premise

The production title recommendation setting, its task mix, and evaluation protocol are representative enough that the observed scaling behavior and technique benefits will transfer to other generative recommender deployments.

What would settle it

A similar large-scale production-shadow evaluation in which the 1B-backbone model fails to exceed the 2M-backbone model on MRR for the reported tasks would falsify the claimed benefit of this scaling approach.

Figures

Figures reproduced from arXiv: 2605.23312 by Ko-Jen Hsiao, Moumita Bhattacharya, Qiuling Xu.

**Figure 2.** Figure 2: Estimated training FLOPs per training token for a 6-layer transformer with hidden [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Latency mismatch between next-token training and delayed cached serving. Title A is [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Relative MRR degradation as cached outputs become stale. Delays are simulated by [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: MTP comparison across serving scenarios. Bars report relative MRR changes for different [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Shared semantic title metadata for encoder events and decoder title representations. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Weekly production-shadow MRR over 1M users. The 1B-backbone model is compared [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

read the original abstract

Generative recommendation models can model user behavior as sequences of events and provide a shared backbone for multiple recommendation tasks. In production, however, pre-training gains do not automatically translate into downstream application improvements: task headroom, repeated-training cost, serving latency, and item freshness all affect transfer. We describe our experience scaling a generative recommender from 2M to 1B backbone parameters, excluding embedding and decoding layers, in a production-scale title recommendation setting. Across multiple downstream tasks, we observe task-dependent scaling behavior: some tasks approach an empirical ceiling within the observed scale range, while others continue to benefit from additional capacity. This motivates using offset scaling-law fits as a diagnostic for where additional model scale may be more or less useful. We then study production constraints that arise when applying the model in practice. Frequent retraining over trillions of behavior tokens makes training and decoding efficiency important; cached serving can make the immediate next-token target stale; and newly launched titles may need to be scored from semantic metadata before collaborative ID embeddings are reliable. We address these issues with multi-token prediction for serving-latency alignment, sampled softmax and a projected decoding head for efficient repeated training, and semantic item towers with collaborative-embedding masking for cold-start adaptation. In a one-week production-shadow evaluation over 1M users, the 1B-backbone model achieves higher MRR than the 2M-backbone baseline across all reported tasks. Overall, the results support treating model scale as one component of a production transfer problem, alongside task headroom, decoding cost, serving-latency alignment, and item generalization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical account of scaling a generative recommender to 1B parameters in one production title-recommendation system, with task-dependent gains and some engineering fixes, but the single-deployment results keep the broader claims narrow.

read the letter

This paper reports that scaling the backbone from 2M to 1B parameters improves MRR over a 2M baseline in a one-week shadow test on 1M users. The gains vary by task, with some hitting an empirical ceiling while others keep improving, which they suggest diagnosing with offset scaling-law fits. They also describe three production adaptations: multi-token prediction to match serving latency, sampled softmax plus a projected head for cheaper repeated training, and semantic towers with collaborative masking for cold-start items. These are concrete responses to real constraints like trillions of tokens, stale next-token targets, and new titles without reliable ID embeddings. The production data is the strongest part; few papers show results from an actual large-scale shadow evaluation. The main limitation is that everything comes from a single title-recommendation deployment with its own user population, item freshness pattern, and task mix. No cross-system replication or controlled variation in those factors is described, so it is hard to know whether the scaling behavior or the fixes would transfer. The abstract also omits baseline details, statistical tests, and data-split information, which makes the MRR comparison harder to evaluate on its own terms. Practitioners tuning generative recommenders at similar scale would find the techniques and the task-headroom reminder useful. Most other readers will see it as a solid case study rather than a general result. I would send it to peer review; the production evidence is worth referee attention even if the authors need to add more on experimental controls and scope.

Referee Report

2 major / 1 minor

Summary. The manuscript describes scaling a generative recommender from a 2M-parameter to a 1B-parameter backbone (excluding embeddings and decoding layers) in a production title-recommendation setting. It reports task-dependent scaling behavior diagnosed via offset scaling-law fits, introduces multi-token prediction for serving-latency alignment, sampled softmax plus projected decoding head for repeated-training efficiency, and semantic item towers with collaborative-embedding masking for cold-start adaptation. A one-week production-shadow evaluation over 1M users finds the 1B model attaining higher MRR than the 2M baseline across reported tasks.

Significance. If the empirical patterns hold, the work supplies concrete production-oriented guidance on when additional scale is likely to be useful versus when tasks have reached empirical ceilings, together with targeted mitigations for retraining cost, latency, and item freshness. The diagnostic framing of scale as one component alongside task headroom and generalization constraints is a useful contribution for practitioners working on generative recommenders.

major comments (2)

[Abstract] Abstract (final paragraph) and evaluation description: the central claim that the 1B-backbone model achieves higher MRR than the 2M baseline rests on a single one-week shadow evaluation over 1M users, yet no information is supplied on the precise baseline configuration, number of tasks, statistical tests, data splits, or controls for proprietary-environment confounds. This detail gap is load-bearing for assessing whether the reported gains support the scaling and technique conclusions.
[Abstract] Abstract and discussion of generalizability: the title and framing emphasize movement toward generalizable methods, but all quantitative results derive from one production title-recommendation deployment with its specific task mix, user population, and item-freshness dynamics. No cross-deployment replication, controlled variation of task headroom, or sensitivity analysis to different user-behavior distributions is reported, so the observed task-dependent scaling and technique benefits may not transfer.

minor comments (1)

The parenthetical clarification that parameter counts exclude embedding and decoding layers is helpful but should be repeated at first use in the main text for readers who encounter only the body.

Simulated Author's Rebuttal

2 responses · 2 unresolved

We thank the referee for highlighting the evaluation transparency and generalizability concerns. We respond point-by-point below.

read point-by-point responses

Referee: [Abstract] Abstract (final paragraph) and evaluation description: the central claim that the 1B-backbone model achieves higher MRR than the 2M baseline rests on a single one-week shadow evaluation over 1M users, yet no information is supplied on the precise baseline configuration, number of tasks, statistical tests, data splits, or controls for proprietary-environment confounds. This detail gap is load-bearing for assessing whether the reported gains support the scaling and technique conclusions.

Authors: We agree the manuscript supplies only high-level evaluation information. Exact baseline configurations, data splits, and statistical tests cannot be disclosed because they are proprietary to the production system. The reported result is a standard one-week shadow test on 1M users showing MRR improvement across reported tasks. We will revise to state the number of tasks evaluated and add an explicit limitations sentence on the single-environment setting. revision: partial
Referee: [Abstract] Abstract and discussion of generalizability: the title and framing emphasize movement toward generalizable methods, but all quantitative results derive from one production title-recommendation deployment with its specific task mix, user population, and item-freshness dynamics. No cross-deployment replication, controlled variation of task headroom, or sensitivity analysis to different user-behavior distributions is reported, so the observed task-dependent scaling and technique benefits may not transfer.

Authors: The quantitative results are indeed from a single deployment. The title and framing present techniques (multi-token prediction, sampled softmax, semantic towers with masking) that target recurring production constraints rather than claiming universal empirical transfer. Task-dependent scaling is positioned as a diagnostic practitioners can apply elsewhere. We will revise the discussion to strengthen the caveats on generalizability. revision: partial

standing simulated objections not resolved

Disclosure of precise baseline configurations, data splits, and statistical tests due to proprietary production constraints.
Performing cross-deployment replication or controlled sensitivity analysis across additional production environments.

Circularity Check

0 steps flagged

No circularity: empirical scaling results are direct observations

full rationale

The paper presents an empirical report on scaling a generative recommender from 2M to 1B parameters in one production title-recommendation deployment, with direct MRR comparisons in a one-week shadow evaluation over 1M users. No equations, parameter fits presented as independent predictions, self-definitional constructs, or load-bearing self-citations are described that would reduce any central claim to its inputs by construction. The offset scaling-law fits are applied diagnostically to observed task-dependent behavior rather than generating forced outputs, and the overall argument treats scale as one factor among others based on reported production constraints and results. The derivation chain is self-contained as an experience report without tautological reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; full manuscript required for ledger construction.

pith-pipeline@v0.9.0 · 5824 in / 1215 out tokens · 30185 ms · 2026-05-25T03:49:03.034483+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 2 internal anchors

[1]

LLM2Vec: Large language models are secretly powerful text encoders

Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, and Siva Reddy. LLM2Vec: Large language models are secretly powerful text encoders. InFirst Conference on Language Modeling, 2024. arXiv:2404.05961

work page arXiv 2024
[2]

LONGER: Scaling up long sequence modeling in industrial recommenders

Zheng Chai, Qin Ren, Xijun Xiao, Huizhi Yang, Bo Han, Sijun Zhang, Di Chen, Hui Lu, Wenlin Zhao, Lele Yu, Xionghang Xie, Shiru Ren, Xiang Sun, Yaocheng Tan, Peng Xu, Yuchao Zheng, and Di Wu. LONGER: Scaling up long sequence modeling in industrial recommenders. Accepted at the 19th ACM Conference on Recommender Systems, 2025. Metadata from official RecSys ...

work page 2025
[3]

PinFM: Foundation model for user activity sequences at a billion-scale visual discovery platform

Xiangyi Chen, Kousik Rajesh, Matthew Lawhon, Zelun Wang, Hanyu Li, Haomiao Li, Saurabh Vishwas Joshi, Pong Eksombatchai, Jaewon Yang, Yi-Ping Hsu, Jiajing Xu, and Charles Rosenberg. PinFM: Foundation model for user activity sequences at a billion-scale visual discovery platform. Accepted at the 19th ACM Conference on Recommender Systems,

work page
[5]

Scaling generative recommendations with context parallelism on hierarchical sequential trans- ducers

Yue Dong, Han Li, Shen Li, Nikhil Patel, Xing Liu, Xiaodong Wang, and Chuanhao Zhuge. Scaling generative recommendations with context parallelism on hierarchical sequential trans- ducers. Accepted at the 19th ACM Conference on Recommender Systems Industry Track,

work page
[6]

Metadata from official RecSys 2025 accepted-contributions page

work page 2025
[7]

Generalized user representations for large-scale recom- mendations and downstream tasks

Ghazal Fazelnia, Sanket Gupta, Claire Keum, Mark Koh, Timothy Heath, Guillermo Car- rasco Hern´ andez, Stephen Xie, Nandini Singh, Ian Anderson, Maya Hristakeva, Petter Pehrson Skid´ en, and Mounia Lalmas. Generalized user representations for large-scale recom- mendations and downstream tasks. Presented at the 19th ACM Conference on Recommender Systems, 2...

work page 2025
[8]

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sifre...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[9]

RADAR: Recall augmentation through deferred asynchronous retrieval

Amit Jaspal, Qian Dang, and Ajantha Ramineni. RADAR: Recall augmentation through deferred asynchronous retrieval. Accepted at the 19th ACM Conference on Recommender Systems Industry Track, 2025. Metadata from official RecSys 2025 accepted-contributions page

work page 2025
[10]

Correcting the LogQ correction: Revisiting sampled softmax for large-scale retrieval

Kirill Khrylchenko, Vladimir Baikalov, Sergei Makeev, Artem Matveev, and Sergei Liamaev. Correcting the LogQ correction: Revisiting sampled softmax for large-scale retrieval. InPro- ceedings of the 19th ACM Conference on Recommender Systems, pages 545–550, 2025. 12

work page 2025
[11]

Exploring scaling laws of CTR model for online performance improvement

Weijiang Lai, Beihong Jin, Jiongyan Zhang, Yiyuan Zheng, Jian Dong, Jia Cheng, Jun Lei, and Xingxing Wang. Exploring scaling laws of CTR model for online performance improvement. InProceedings of the 19th ACM Conference on Recommender Systems, 2025

work page 2025
[12]

Luyi Ma, Wanjia Zhang, Kai Zhao, Abhishek Kulkarni, Lalitesh Morishetti, Anjana Ganesh, Ashish Ranjan, Aashika Padmanabhan, Jianpeng Xu, Jason H. D. Cho, Praveenkumar Kanu- mala, Kaushiki Nag, Sumit Dutta, Kamiya Motwani, Malay Patel, Evren Korpeoglu, Sushant Kumar, and Kannan Achan. GRACE: Generative recommendation via journey-aware sparse attention on c...

work page 2025
[13]

Jeffrey Mei, Florian Henkel, Samuel E

M. Jeffrey Mei, Florian Henkel, Samuel E. Sandberg, Oliver Bembom, and Andreas F. Ehmann. Semantic IDs for music recommendation. Accepted at the 19th ACM Conference on Rec- ommender Systems Industry Track, 2025. Metadata from official RecSys 2025 accepted- contributions page

work page 2025
[14]

Scalable cross-entropy loss for sequential recommendations with large item catalogs

Gleb Mezentsev, Danil Gusak, Ivan Oseledets, and Evgeny Frolov. Scalable cross-entropy loss for sequential recommendations with large item catalogs. InProceedings of the 18th ACM Conference on Recommender Systems, 2024

work page 2024
[15]

Toward 100TB recommendation models with embedding offloading

Intaik Park, Ehsan Ardestani, Damian Reeves, Sarunya Pumma, Henry Tsang, Levy Zhao, Jian He, Joshua Deng, Dennis Van der Staay, Yu Guo, and Paul Zhang. Toward 100TB recommendation models with embedding offloading. Accepted at the 18th ACM Conference on Recommender Systems Industry Track, 2024. Metadata from official RecSys 2024 accepted- contributions page

work page 2024
[16]

Petrov, Craig Macdonald, and Nicola Tonellotto

Aleksandr V. Petrov, Craig Macdonald, and Nicola Tonellotto. Efficient inference of sub-item id-based sequential recommendation models with millions of items. InProceedings of the 18th ACM Conference on Recommender Systems, pages 912–917, 2024

work page 2024
[17]

Tran, Justin Samost, and Maciej Kula

Shashank Rajput, Nikhil Mehta, Akshay Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukas Heldt, Lichan Hong, Yi Tay, Vinh Q. Tran, Justin Samost, and Maciej Kula. Rec- ommender systems with generative retrieval. InAdvances in Neural Information Processing Systems, 2023

work page 2023
[18]

Are emergent abilities of large language models a mirage?arXiv preprint arXiv:2304.15004, 2023

Rylan Schaeffer, Brando Miranda, and Sanmi Koyejo. Are emergent abilities of large language models a mirage?arXiv preprint arXiv:2304.15004, 2023

work page arXiv 2023
[19]

GenSAR: Unifying balanced search and recommendation with generative retrieval

Teng Shi, Jun Xu, Xiao Zhang, Xiaoxue Zang, Kai Zheng, Yang Song, and Enyun Yu. GenSAR: Unifying balanced search and recommendation with generative retrieval. Accepted at the 19th ACM Conference on Recommender Systems, 2025. Metadata from official RecSys 2025 accepted-contributions page

work page 2025
[20]

Better generalization with semantic IDs: A case study in ranking for recom- mendations

Anima Singh, Trung Vu, Nikhil Mehta, Raghunandan Hulikal Keshavan, Maheswaran Sathi- amoorthy, Yilin Zheng, Lichan Hong, Lukasz Heldt, Li Wei, Devansh Tandon, Ed Chi, and Xinyang Yi. Better generalization with semantic IDs: A case study in ranking for recom- mendations. InProceedings of the 18th ACM Conference on Recommender Systems, pages 1039–1044, 2024

work page 2024
[21]

Item-centric exploration for cold start problem

Dong Wang, Junyi Jiao, Arnab Bhadury, Yaping Zhang, Mingyan Gao, and Onkar Dalal. Item-centric exploration for cold start problem. InProceedings of the 19th ACM Conference on Recommender Systems, pages 987–990, 2025. 13

work page 2025
[22]

Cut your losses in large-vocabulary language models

Erik Wijmans, Brody Huval, Alexander Hertzberg, Vladlen Koltun, and Philipp Kr¨ ahenb¨ uhl. Cut your losses in large-vocabulary language models. InInternational Conference on Learning Representations, 2025. arXiv:2411.09009

work page arXiv 2025
[23]

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

Jiaqi Zhai, Zhao Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhaojie Gong, Feng Hu, Zhaojie Wu, et al. Actions speak louder than words: Trillion-parameter sequential transducers for generative recommendations.arXiv preprint arXiv:2402.17152, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[24]

Scaling law of large sequential recommendation models

Gaowei Zhang, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, and Ji-Rong Wen. Scaling law of large sequential recommendation models. InProceedings of the 18th ACM Conference on Recommender Systems, pages 444–453, 2024

work page 2024
[25]

CoST: Con- trastive quantization based semantic tokenization for generative recommendation

Jieming Zhu, Mengqun Jin, Qijiong Liu, Zexuan Qiu, Zhenhua Dong, and Xiu Li. CoST: Con- trastive quantization based semantic tokenization for generative recommendation. Accepted at the 18th ACM Conference on Recommender Systems, 2024. Metadata from official RecSys 2024 accepted-contributions page. 14

work page 2024

[1] [1]

LLM2Vec: Large language models are secretly powerful text encoders

Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, and Siva Reddy. LLM2Vec: Large language models are secretly powerful text encoders. InFirst Conference on Language Modeling, 2024. arXiv:2404.05961

work page arXiv 2024

[2] [2]

LONGER: Scaling up long sequence modeling in industrial recommenders

Zheng Chai, Qin Ren, Xijun Xiao, Huizhi Yang, Bo Han, Sijun Zhang, Di Chen, Hui Lu, Wenlin Zhao, Lele Yu, Xionghang Xie, Shiru Ren, Xiang Sun, Yaocheng Tan, Peng Xu, Yuchao Zheng, and Di Wu. LONGER: Scaling up long sequence modeling in industrial recommenders. Accepted at the 19th ACM Conference on Recommender Systems, 2025. Metadata from official RecSys ...

work page 2025

[3] [3]

PinFM: Foundation model for user activity sequences at a billion-scale visual discovery platform

Xiangyi Chen, Kousik Rajesh, Matthew Lawhon, Zelun Wang, Hanyu Li, Haomiao Li, Saurabh Vishwas Joshi, Pong Eksombatchai, Jaewon Yang, Yi-Ping Hsu, Jiajing Xu, and Charles Rosenberg. PinFM: Foundation model for user activity sequences at a billion-scale visual discovery platform. Accepted at the 19th ACM Conference on Recommender Systems,

work page

[4] [5]

Scaling generative recommendations with context parallelism on hierarchical sequential trans- ducers

Yue Dong, Han Li, Shen Li, Nikhil Patel, Xing Liu, Xiaodong Wang, and Chuanhao Zhuge. Scaling generative recommendations with context parallelism on hierarchical sequential trans- ducers. Accepted at the 19th ACM Conference on Recommender Systems Industry Track,

work page

[5] [6]

Metadata from official RecSys 2025 accepted-contributions page

work page 2025

[6] [7]

Generalized user representations for large-scale recom- mendations and downstream tasks

Ghazal Fazelnia, Sanket Gupta, Claire Keum, Mark Koh, Timothy Heath, Guillermo Car- rasco Hern´ andez, Stephen Xie, Nandini Singh, Ian Anderson, Maya Hristakeva, Petter Pehrson Skid´ en, and Mounia Lalmas. Generalized user representations for large-scale recom- mendations and downstream tasks. Presented at the 19th ACM Conference on Recommender Systems, 2...

work page 2025

[7] [8]

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sifre...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[8] [9]

RADAR: Recall augmentation through deferred asynchronous retrieval

Amit Jaspal, Qian Dang, and Ajantha Ramineni. RADAR: Recall augmentation through deferred asynchronous retrieval. Accepted at the 19th ACM Conference on Recommender Systems Industry Track, 2025. Metadata from official RecSys 2025 accepted-contributions page

work page 2025

[9] [10]

Correcting the LogQ correction: Revisiting sampled softmax for large-scale retrieval

Kirill Khrylchenko, Vladimir Baikalov, Sergei Makeev, Artem Matveev, and Sergei Liamaev. Correcting the LogQ correction: Revisiting sampled softmax for large-scale retrieval. InPro- ceedings of the 19th ACM Conference on Recommender Systems, pages 545–550, 2025. 12

work page 2025

[10] [11]

Exploring scaling laws of CTR model for online performance improvement

Weijiang Lai, Beihong Jin, Jiongyan Zhang, Yiyuan Zheng, Jian Dong, Jia Cheng, Jun Lei, and Xingxing Wang. Exploring scaling laws of CTR model for online performance improvement. InProceedings of the 19th ACM Conference on Recommender Systems, 2025

work page 2025

[11] [12]

Luyi Ma, Wanjia Zhang, Kai Zhao, Abhishek Kulkarni, Lalitesh Morishetti, Anjana Ganesh, Ashish Ranjan, Aashika Padmanabhan, Jianpeng Xu, Jason H. D. Cho, Praveenkumar Kanu- mala, Kaushiki Nag, Sumit Dutta, Kamiya Motwani, Malay Patel, Evren Korpeoglu, Sushant Kumar, and Kannan Achan. GRACE: Generative recommendation via journey-aware sparse attention on c...

work page 2025

[12] [13]

Jeffrey Mei, Florian Henkel, Samuel E

M. Jeffrey Mei, Florian Henkel, Samuel E. Sandberg, Oliver Bembom, and Andreas F. Ehmann. Semantic IDs for music recommendation. Accepted at the 19th ACM Conference on Rec- ommender Systems Industry Track, 2025. Metadata from official RecSys 2025 accepted- contributions page

work page 2025

[13] [14]

Scalable cross-entropy loss for sequential recommendations with large item catalogs

Gleb Mezentsev, Danil Gusak, Ivan Oseledets, and Evgeny Frolov. Scalable cross-entropy loss for sequential recommendations with large item catalogs. InProceedings of the 18th ACM Conference on Recommender Systems, 2024

work page 2024

[14] [15]

Toward 100TB recommendation models with embedding offloading

Intaik Park, Ehsan Ardestani, Damian Reeves, Sarunya Pumma, Henry Tsang, Levy Zhao, Jian He, Joshua Deng, Dennis Van der Staay, Yu Guo, and Paul Zhang. Toward 100TB recommendation models with embedding offloading. Accepted at the 18th ACM Conference on Recommender Systems Industry Track, 2024. Metadata from official RecSys 2024 accepted- contributions page

work page 2024

[15] [16]

Petrov, Craig Macdonald, and Nicola Tonellotto

Aleksandr V. Petrov, Craig Macdonald, and Nicola Tonellotto. Efficient inference of sub-item id-based sequential recommendation models with millions of items. InProceedings of the 18th ACM Conference on Recommender Systems, pages 912–917, 2024

work page 2024

[16] [17]

Tran, Justin Samost, and Maciej Kula

Shashank Rajput, Nikhil Mehta, Akshay Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukas Heldt, Lichan Hong, Yi Tay, Vinh Q. Tran, Justin Samost, and Maciej Kula. Rec- ommender systems with generative retrieval. InAdvances in Neural Information Processing Systems, 2023

work page 2023

[17] [18]

Are emergent abilities of large language models a mirage?arXiv preprint arXiv:2304.15004, 2023

Rylan Schaeffer, Brando Miranda, and Sanmi Koyejo. Are emergent abilities of large language models a mirage?arXiv preprint arXiv:2304.15004, 2023

work page arXiv 2023

[18] [19]

GenSAR: Unifying balanced search and recommendation with generative retrieval

Teng Shi, Jun Xu, Xiao Zhang, Xiaoxue Zang, Kai Zheng, Yang Song, and Enyun Yu. GenSAR: Unifying balanced search and recommendation with generative retrieval. Accepted at the 19th ACM Conference on Recommender Systems, 2025. Metadata from official RecSys 2025 accepted-contributions page

work page 2025

[19] [20]

Better generalization with semantic IDs: A case study in ranking for recom- mendations

Anima Singh, Trung Vu, Nikhil Mehta, Raghunandan Hulikal Keshavan, Maheswaran Sathi- amoorthy, Yilin Zheng, Lichan Hong, Lukasz Heldt, Li Wei, Devansh Tandon, Ed Chi, and Xinyang Yi. Better generalization with semantic IDs: A case study in ranking for recom- mendations. InProceedings of the 18th ACM Conference on Recommender Systems, pages 1039–1044, 2024

work page 2024

[20] [21]

Item-centric exploration for cold start problem

Dong Wang, Junyi Jiao, Arnab Bhadury, Yaping Zhang, Mingyan Gao, and Onkar Dalal. Item-centric exploration for cold start problem. InProceedings of the 19th ACM Conference on Recommender Systems, pages 987–990, 2025. 13

work page 2025

[21] [22]

Cut your losses in large-vocabulary language models

Erik Wijmans, Brody Huval, Alexander Hertzberg, Vladlen Koltun, and Philipp Kr¨ ahenb¨ uhl. Cut your losses in large-vocabulary language models. InInternational Conference on Learning Representations, 2025. arXiv:2411.09009

work page arXiv 2025

[22] [23]

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

Jiaqi Zhai, Zhao Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhaojie Gong, Feng Hu, Zhaojie Wu, et al. Actions speak louder than words: Trillion-parameter sequential transducers for generative recommendations.arXiv preprint arXiv:2402.17152, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[23] [24]

Scaling law of large sequential recommendation models

Gaowei Zhang, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, and Ji-Rong Wen. Scaling law of large sequential recommendation models. InProceedings of the 18th ACM Conference on Recommender Systems, pages 444–453, 2024

work page 2024

[24] [25]

CoST: Con- trastive quantization based semantic tokenization for generative recommendation

Jieming Zhu, Mengqun Jin, Qijiong Liu, Zexuan Qiu, Zhenhua Dong, and Xiu Li. CoST: Con- trastive quantization based semantic tokenization for generative recommendation. Accepted at the 18th ACM Conference on Recommender Systems, 2024. Metadata from official RecSys 2024 accepted-contributions page. 14

work page 2024