TubiFM: Unified Item, Carousel, and Search Ranking for Streaming Discovery

Alexandre Salle; Chenglei Niu; Michael Tamir; Qiang Chen; Saurabh Agrawal; Shervin Shahryari; Suchismit Mahapatra; Suvash Sedhain; Xiaoxiao Chen; Yaqi Wang

arxiv: 2605.23702 · v1 · pith:6RMY4RLAnew · submitted 2026-05-22 · 💻 cs.IR

TubiFM: Unified Item, Carousel, and Search Ranking for Streaming Discovery

Alexandre Salle , Chenglei Niu , Suchismit Mahapatra , Xiaoxiao Chen , Suvash Sedhain , Yaqi Wang , Shervin Shahryari , Saurabh Agrawal

show 2 more authors

Qiang Chen Michael Tamir

This is my paper

Pith reviewed 2026-05-25 03:02 UTC · model grok-4.3

classification 💻 cs.IR

keywords unified rankinguser storynext-token predictionstreaming recommendationsearch rankingcarousel rankingLlama modelpersonalized discovery

0 comments

The pith

A single Llama-based model trained on serialized user histories ranks items, carousels, and search results as next-token prediction without task-specific architectures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that item ranking, carousel ranking, and search ranking draw from overlapping user signals yet are usually handled by separate models. It shows these tasks can instead be expressed uniformly by turning a viewer's full cross-surface history into one token sequence called a user story. Interleaving language tokens with event tokens lets a single prompted language model perform all three rankings. The resulting TubiFM model beats specialist baselines offline and raises search total viewing time 3.9 percent and carousel viewing time 0.3 percent in live tests while cutting ranking latency. If the approach holds, production discovery systems could drop multiple dedicated models and still match or exceed current accuracy.

Core claim

TubiFM is one instantiation of this approach: a Llama 3.2 1B-based model trained on user stories and prompted to rank items, carousels, or search results without task-specific architectures. In offline evaluation, this single model outperforms specialist baselines across item, carousel, and search ranking. In online A/B tests, TubiFM significantly improves search total viewing time (TVT) by +3.9% and carousel TVT by +0.30%. Item ranking is statistically neutral on TVT (+0.14%), but matches a mature production stack; across all three tasks, TubiFM serves on L40S GPUs and reduces p99 ranking latency from 500ms to 200ms.

What carries the argument

The user story, a serialized token sequence that converts cross-surface history (attributes, sessions, watch events with surface and carousel context, and search events) into a single sequence for prompted next-token prediction.

If this is right

One model suffices for item, carousel, and search ranking.
Search total viewing time rises 3.9 percent and carousel total viewing time rises 0.30 percent in live traffic.
p99 ranking latency drops from 500 ms to 200 ms while running on the same GPU hardware.
Item ranking stays statistically neutral on total viewing time yet matches an existing production stack.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same user-story format could be extended to additional surfaces such as home-page rows or notifications.
Because the model uses a shared grammar, adding a new ranking surface may require only new prompt tokens rather than a new model.
The latency reduction could free compute budget for deeper context windows in the same serving fleet.

Load-bearing premise

Interleaving pretrained language tokens with domain-specific event tokens lets heterogeneous recommendation and search tasks be expressed as prompted next-token prediction over a shared grammar without task-specific architectures.

What would settle it

An offline evaluation in which the single TubiFM model fails to outperform the three specialist baselines on at least one of the item, carousel, or search ranking tasks.

Figures

Figures reproduced from arXiv: 2605.23702 by Alexandre Salle, Chenglei Niu, Michael Tamir, Qiang Chen, Saurabh Agrawal, Shervin Shahryari, Suchismit Mahapatra, Suvash Sedhain, Xiaoxiao Chen, Yaqi Wang.

**Figure 1.** Figure 1: Prompted formulation: changing the prompt switches the ranking task while keeping a single shared model. In the [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

read the original abstract

Personalized discovery systems often train separate models for item ranking, carousel ranking, and search, even though these tasks expose complementary signals from the same viewer journey: watches shape carousel and item ranking, search queries reveal intent even when they do not lead to a catalog match, and watch history helps interpret search as rewatching, continuation, or new discovery. We introduce the user story, a serialized representation that turns a user's cross-surface history - attributes, sessions, watch events with surface and carousel context, and search events - into a single token sequence. By interleaving pretrained language tokens with domain-specific event tokens, user stories let heterogeneous recommendation and search tasks be expressed as prompted next-token prediction over a shared grammar. TubiFM is one instantiation of this approach: a Llama 3.2 1B-based model trained on user stories and prompted to rank items, carousels, or search results without task-specific architectures. In offline evaluation, this single model outperforms specialist baselines across item, carousel, and search ranking. In online A/B tests, TubiFM significantly improves search total viewing time (TVT) by $+3.9\%$ and carousel TVT by $+0.30\%$. Item ranking is statistically neutral on TVT ($+0.14\%$), but matches a mature production stack; across all three tasks, TubiFM serves on L40S GPUs and reduces p99 ranking latency from 500ms to 200ms. These results show that shared user stories can improve discovery while simplifying ranking systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TubiFM shows a single small Llama model prompted on serialized user stories can replace three separate ranking systems, with clear latency wins and a solid search lift but smaller or neutral gains elsewhere.

read the letter

The core result is that one Llama 3.2 1B model, trained on these user stories, handles item ranking, carousel ranking, and search ranking through prompting alone. It beats the specialist baselines offline and delivers +3.9% search TVT and +0.30% carousel TVT online while dropping p99 latency from 500 ms to 200 ms on L40S GPUs. Item ranking stays neutral on TVT but matches the existing stack. That combination of unification and efficiency is the practical takeaway for anyone running multiple ranking surfaces on the same user data.

Referee Report

1 major / 0 minor

Summary. The manuscript introduces the 'user story' as a serialized token sequence that captures a user's cross-surface history (attributes, sessions, watch events with context, and search events) by interleaving pretrained language tokens with domain-specific event tokens. This representation allows heterogeneous ranking tasks to be expressed as prompted next-token prediction. TubiFM, a Llama 3.2 1B instantiation, is trained on these sequences and prompted for item, carousel, or search ranking without task-specific architectures. The paper reports that the single model outperforms specialist baselines in offline evaluation across the three tasks; in online A/B tests it improves search TVT by +3.9% and carousel TVT by +0.30% (item ranking neutral at +0.14%), while reducing p99 latency from 500 ms to 200 ms on L40S GPUs.

Significance. If the empirical claims hold, the work is significant because it shows that complementary signals from item ranking, carousel ranking, and search can be unified in a single prompted language model, yielding measurable engagement lifts and substantial latency reduction while simplifying the production ranking stack. The user-story serialization and shared grammar are concrete technical contributions that enable the unification without per-task heads or architectures.

major comments (1)

[Abstract and §4] Abstract and §4 (offline and online evaluation sections): the central claim that TubiFM 'outperforms specialist baselines across item, carousel, and search ranking' and delivers the stated TVT lifts rests on reported numbers, yet the manuscript supplies no information on the identity or training details of the specialist baselines, the statistical tests used, the train/test splits, or potential confounds such as position bias or data leakage. These omissions make it impossible to verify whether the numbers support the unification claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the manuscript requires additional methodological details to allow verification of the reported results and the unification claim. We will revise the paper accordingly.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (offline and online evaluation sections): the central claim that TubiFM 'outperforms specialist baselines across item, carousel, and search ranking' and delivers the stated TVT lifts rests on reported numbers, yet the manuscript supplies no information on the identity or training details of the specialist baselines, the statistical tests used, the train/test splits, or potential confounds such as position bias or data leakage. These omissions make it impossible to verify whether the numbers support the unification claim.

Authors: We acknowledge that the current version omits key experimental details. In the revised manuscript we will expand §4 with: (i) explicit descriptions of each specialist baseline (architecture, feature sets, loss functions, and training data); (ii) the statistical tests performed (including test statistic, degrees of freedom, and p-value thresholds); (iii) precise train/test split methodology, including temporal cutoffs and leakage-prevention steps; and (iv) explicit discussion of position-bias handling and any leakage audits performed. These additions will directly address the referee’s concerns and enable independent assessment of the unification benefits. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical system description and evaluation results with no equations, derivations, or parameter-fitting steps that could reduce to self-definition or fitted inputs. Claims rest on reported offline comparisons against specialist baselines and online A/B test lifts (TVT improvements), which are external to any internal construction. No self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing premises. The core modeling choice (user stories as interleaved token sequences for prompted next-token prediction) is presented as an architectural decision rather than a derived result, and the reported performance numbers are not shown to be tautological with the training procedure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Based on the abstract alone, the central claim rests on the effectiveness of the user-story representation and the premise that a shared next-token-prediction grammar suffices for three distinct ranking tasks.

invented entities (1)

user story no independent evidence
purpose: serialized token sequence that unifies cross-surface user history for language-model prompting
Introduced in the abstract as the key new representation enabling the unified model.

pith-pipeline@v0.9.0 · 5846 in / 1095 out tokens · 20235 ms · 2026-05-25T03:02:25.837493+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 5 internal anchors

[1]

Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv:1611.09268 doi:10.48550/arXiv.1611.09268

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1611.09268 2016
[2]

Nicola De Cao, Gautier Izacard, Sebastian Riedel, and Fabio Petroni. 2020. Au- toregressive Entity Retrieval. arXiv:2010.00904 doi:10.48550/arXiv.2010.00904

work page doi:10.48550/arxiv.2010.00904 2020
[3]

Haonan Chen, Zhicheng Dou, Yutao Zhu, Zhao Cao, Xiaohua Cheng, and Ji-Rong Wen. 2022. Enhancing User Behavior Sequence Modeling by Generative Tasks for Session Search. InProceedings of the 31st ACM International Conference on Information & Knowledge Management. doi:10.1145/3511808.3557310

work page doi:10.1145/3511808.3557310 2022
[4]

Sunhao Dai, Ninglu Shao, Haiyuan Zhao, Weijie Yu, Zihua Si, Chen Xu, Zhongx- iang Sun, Xiao Zhang, and Jun Xu. 2023. Uncovering ChatGPT’s Capabilities in Recommender Systems. InProceedings of the 17th ACM Conference on Recom- mender Systems. Association for Computing Machinery. doi:10.1145/3604915. 3610646

work page doi:10.1145/3604915 2023
[5]

Gould, Yves Raimond, Sandeep Ghael, Tony Jebara, Andreas Damianou, Vladan Radosavljevic, Paul N

Marco De Nadai, Edoardo D’Amico, Max Lefarov, Alexandre Tamborrino, Divita Vohra, Mark VanMiddlesworth, Shawn Lin, Jacqueline Wood, Jan Stypka, Eliza Klyce, Keshi Dai, Timothy Christopher Heath, Martin D. Gould, Yves Raimond, Sandeep Ghael, Tony Jebara, Andreas Damianou, Vladan Radosavljevic, Paul N. Bennett, Mounia Lalmas, and Praveen Chandar. 2026. A Un...

work page arXiv 2026
[6]

Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang. 2022. Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5). InProceedings of the 16th ACM Conference on Recommender Systems. 299–315. doi:10.1145/3523227.3546767

work page doi:10.1145/3523227.3546767 2022
[7]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Ab- hishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schel- ten, Alex Vaughan, Amy Yang, et al . 2024. The Llama 3 Herd of Models. arXiv:2407.21783 [cs.AI] https://arxiv.org/abs/2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

Ruining He, Lukasz Heldt, Lichan Hong, Raghunandan Keshavan, Shifan Mao, Nikhil Mehta, Zhengyang Su, Alicia Tsai, Yueqi Wang, Shao-Chuan Wang, Xinyang Yi, Lexi Baugher, Baykal Cakici, Ed Chi, Cristos Goodrow, Ningren Han, He Ma, Romer Rosales, Abby Van Soest, Devansh Tandon, Su-Lin Wu, Wei- long Yang, and Yilin Zheng. 2025. PLUM: Adapting Pre-trained Lang...

work page arXiv 2025
[9]

Wenyue Hua, Shuyuan Xu, Yingqiang Ge, and Yongfeng Zhang. 2023. How to Index Item IDs for Recommendation Foundation Models. InProceedings of the 2023 ACM Conference on Recommender Systems. doi:10.1145/3624918.3625339

work page doi:10.1145/3624918.3625339 2023
[10]

Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated Gain-Based Evaluation of IR Techniques.ACM Transactions on Information Systems20 (2002), 422–446. doi:10.1145/582415.582418

work page doi:10.1145/582415.582418 2002
[11]

Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Rec- ommendation. In2018 IEEE International Conference on Data Mining (ICDM). 197–206. doi:10.1109/ICDM.2018.00035

work page doi:10.1109/icdm.2018.00035 2018
[12]

Mingfu Liang, Yufei Li, Jay Xu, Kavosh Asadi, Xi Liu, Shuo Gu, Kaushik Rangadu- rai, Frank Shyu, Shuaiwen Wang, Song Yang, Zhijing Li, Jiang Liu, Mengying Sun, Fei Tian, Xiaohan Wei, Chonglin Sun, Jacob Tao, Shike Mei, Wenlin Chen, San- tanu Kolay, Sandeep Pandey, Hamed Firooz, and Luke Simon. 2026. Generative Reasoning Re-ranker. arXiv:2602.07774 [cs.IR]...

work page arXiv 2026
[13]

Tran, Jinfeng Rao, Marc Najork, Emma Strubell, and Donald Metzler

Sanket Vaibhav Mehta, Jai Gupta, Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Jinfeng Rao, Marc Najork, Emma Strubell, and Donald Metzler. 2022. DSI++: Updating Transformer Memory with New Documents. arXiv:2212.09744 doi:10.48550/ arXiv.2212.09744

work page arXiv 2022
[14]

Gustavo Penha, Ali Vardasbi, Enrico Palumbo, Marco De Nadai, and Hugues Bouchard. 2024. Bridging Search and Recommendation in Generative Retrieval: Does One Task Help the Other?. InProceedings of the 18th ACM Conference on Recommender Systems (RecSys ’24). Association for Computing Machinery, New York, NY, USA, 340–349. doi:10.1145/3640457.3688123

work page doi:10.1145/3640457.3688123 2024
[15]

Tran, Jonah Samost, Maciej Kula, Ed H

Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q. Tran, Jonah Samost, Maciej Kula, Ed H. Chi, and Maheswaran Sathiamoorthy. 2023. Recommender Systems with Generative Retrieval. InThirty-seventh Conference on Neural Information Processing Systems. https://openreview.net/forum?id=B...

work page 2023
[16]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). doi:10.18653/v1/D19-1410

work page doi:10.18653/v1/d19-1410 2019
[17]

Stephen Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Frame- work: BM25 and Beyond.Foundations and Trends in Information Retrieval3 (2009), 333–389. doi:10.1561/1500000019

work page doi:10.1561/1500000019 2009
[18]

Teng Shi, Jun Xu, Xiao Zhang, Xiaoxue Zang, Kai Zheng, Yang Song, and Enyun Yu. 2025. Unified Generative Search and Recommendation.arXiv preprint arXiv:2504.05730(2025). arXiv:2504.05730 [cs.IR] https://arxiv.org/abs/2504.05730

work page arXiv 2025
[19]

Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W

Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, and Donald Metzler. 2022. Transformer Memory as a Differentiable Search Index. arXiv:2202.06991 doi:10.48550/arXiv.2202.06991

work page doi:10.48550/arxiv.2202.06991 2022
[20]

Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. arXiv:2104.08663 doi:10.48550/arXiv.2104.08663

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2104.08663 2021
[21]

Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & Cross Network for Ad Click Predictions. InProceedings of the ADKDD’17. doi:10.1145/3124749. 3124754

work page doi:10.1145/3124749 2017
[22]

Fan Yang, Zheng Chen, Ziyan Jiang, Eunah Cho, Xiaojiang Huang, and Yan- bin Lu. 2023. PALR: Personalization Aware LLMs for Recommendation. arXiv:2305.07622 [cs.IR] https://arxiv.org/abs/2305.07622

work page arXiv 2023
[23]

Zhengyi Yang, Jiancan Wu, Zhicai Wang, Xiang Wang, Yancheng Yuan, and Xiangnan He. 2023. Generate What You Prefer: Reshaping Sequential Recom- mendation via Guided Diffusion. arXiv:2310.20453 doi:10.48550/arXiv.2310.20453

work page doi:10.48550/arxiv.2310.20453 2023
[24]

Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhao- jie Gong, Fangda Gu, Michael He, Yinghai Lu, and Yu Shi. 2024. Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations. arXiv:2402.17152 doi:10.48550/arXiv.2402.17152

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.17152 2024
[25]

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou

work page
[26]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models. arXiv:2506.05176 [cs.CL] https://arxiv.org/abs/2506.05176

work page internal anchor Pith review Pith/arXiv arXiv
[27]

Guorui Zhou, Honghui Bao, Jiaming Huang, Jiaxin Deng, Jinghao Zhang, Junda She, Kuo Cai, Lejian Ren, Lu Ren, Qiang Luo, Qianqian Wang, Qigen Hu, Rongzhou Zhang, Ruiming Tang, Shiyao Wang, Wuchao Li, Xiangyu Wu, Xinchen Luo, Xingmei Wang, Yifei Hu, Yunfan Wu, Zhanyu Liu, Zhiyang Zhang, Zixing Zhang, Bo Chen, Bin Wen, Chaoyi Ma, Chengru Song, Chenglong Chu,...

work page arXiv 2025

[1] [1]

Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv:1611.09268 doi:10.48550/arXiv.1611.09268

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1611.09268 2016

[2] [2]

Nicola De Cao, Gautier Izacard, Sebastian Riedel, and Fabio Petroni. 2020. Au- toregressive Entity Retrieval. arXiv:2010.00904 doi:10.48550/arXiv.2010.00904

work page doi:10.48550/arxiv.2010.00904 2020

[3] [3]

Haonan Chen, Zhicheng Dou, Yutao Zhu, Zhao Cao, Xiaohua Cheng, and Ji-Rong Wen. 2022. Enhancing User Behavior Sequence Modeling by Generative Tasks for Session Search. InProceedings of the 31st ACM International Conference on Information & Knowledge Management. doi:10.1145/3511808.3557310

work page doi:10.1145/3511808.3557310 2022

[4] [4]

Sunhao Dai, Ninglu Shao, Haiyuan Zhao, Weijie Yu, Zihua Si, Chen Xu, Zhongx- iang Sun, Xiao Zhang, and Jun Xu. 2023. Uncovering ChatGPT’s Capabilities in Recommender Systems. InProceedings of the 17th ACM Conference on Recom- mender Systems. Association for Computing Machinery. doi:10.1145/3604915. 3610646

work page doi:10.1145/3604915 2023

[5] [5]

Gould, Yves Raimond, Sandeep Ghael, Tony Jebara, Andreas Damianou, Vladan Radosavljevic, Paul N

Marco De Nadai, Edoardo D’Amico, Max Lefarov, Alexandre Tamborrino, Divita Vohra, Mark VanMiddlesworth, Shawn Lin, Jacqueline Wood, Jan Stypka, Eliza Klyce, Keshi Dai, Timothy Christopher Heath, Martin D. Gould, Yves Raimond, Sandeep Ghael, Tony Jebara, Andreas Damianou, Vladan Radosavljevic, Paul N. Bennett, Mounia Lalmas, and Praveen Chandar. 2026. A Un...

work page arXiv 2026

[6] [6]

Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang. 2022. Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5). InProceedings of the 16th ACM Conference on Recommender Systems. 299–315. doi:10.1145/3523227.3546767

work page doi:10.1145/3523227.3546767 2022

[7] [7]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Ab- hishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schel- ten, Alex Vaughan, Amy Yang, et al . 2024. The Llama 3 Herd of Models. arXiv:2407.21783 [cs.AI] https://arxiv.org/abs/2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024

[8] [8]

Ruining He, Lukasz Heldt, Lichan Hong, Raghunandan Keshavan, Shifan Mao, Nikhil Mehta, Zhengyang Su, Alicia Tsai, Yueqi Wang, Shao-Chuan Wang, Xinyang Yi, Lexi Baugher, Baykal Cakici, Ed Chi, Cristos Goodrow, Ningren Han, He Ma, Romer Rosales, Abby Van Soest, Devansh Tandon, Su-Lin Wu, Wei- long Yang, and Yilin Zheng. 2025. PLUM: Adapting Pre-trained Lang...

work page arXiv 2025

[9] [9]

Wenyue Hua, Shuyuan Xu, Yingqiang Ge, and Yongfeng Zhang. 2023. How to Index Item IDs for Recommendation Foundation Models. InProceedings of the 2023 ACM Conference on Recommender Systems. doi:10.1145/3624918.3625339

work page doi:10.1145/3624918.3625339 2023

[10] [10]

Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated Gain-Based Evaluation of IR Techniques.ACM Transactions on Information Systems20 (2002), 422–446. doi:10.1145/582415.582418

work page doi:10.1145/582415.582418 2002

[11] [11]

Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Rec- ommendation. In2018 IEEE International Conference on Data Mining (ICDM). 197–206. doi:10.1109/ICDM.2018.00035

work page doi:10.1109/icdm.2018.00035 2018

[12] [12]

Mingfu Liang, Yufei Li, Jay Xu, Kavosh Asadi, Xi Liu, Shuo Gu, Kaushik Rangadu- rai, Frank Shyu, Shuaiwen Wang, Song Yang, Zhijing Li, Jiang Liu, Mengying Sun, Fei Tian, Xiaohan Wei, Chonglin Sun, Jacob Tao, Shike Mei, Wenlin Chen, San- tanu Kolay, Sandeep Pandey, Hamed Firooz, and Luke Simon. 2026. Generative Reasoning Re-ranker. arXiv:2602.07774 [cs.IR]...

work page arXiv 2026

[13] [13]

Tran, Jinfeng Rao, Marc Najork, Emma Strubell, and Donald Metzler

Sanket Vaibhav Mehta, Jai Gupta, Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Jinfeng Rao, Marc Najork, Emma Strubell, and Donald Metzler. 2022. DSI++: Updating Transformer Memory with New Documents. arXiv:2212.09744 doi:10.48550/ arXiv.2212.09744

work page arXiv 2022

[14] [14]

Gustavo Penha, Ali Vardasbi, Enrico Palumbo, Marco De Nadai, and Hugues Bouchard. 2024. Bridging Search and Recommendation in Generative Retrieval: Does One Task Help the Other?. InProceedings of the 18th ACM Conference on Recommender Systems (RecSys ’24). Association for Computing Machinery, New York, NY, USA, 340–349. doi:10.1145/3640457.3688123

work page doi:10.1145/3640457.3688123 2024

[15] [15]

Tran, Jonah Samost, Maciej Kula, Ed H

Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q. Tran, Jonah Samost, Maciej Kula, Ed H. Chi, and Maheswaran Sathiamoorthy. 2023. Recommender Systems with Generative Retrieval. InThirty-seventh Conference on Neural Information Processing Systems. https://openreview.net/forum?id=B...

work page 2023

[16] [16]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). doi:10.18653/v1/D19-1410

work page doi:10.18653/v1/d19-1410 2019

[17] [17]

Stephen Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Frame- work: BM25 and Beyond.Foundations and Trends in Information Retrieval3 (2009), 333–389. doi:10.1561/1500000019

work page doi:10.1561/1500000019 2009

[18] [18]

Teng Shi, Jun Xu, Xiao Zhang, Xiaoxue Zang, Kai Zheng, Yang Song, and Enyun Yu. 2025. Unified Generative Search and Recommendation.arXiv preprint arXiv:2504.05730(2025). arXiv:2504.05730 [cs.IR] https://arxiv.org/abs/2504.05730

work page arXiv 2025

[19] [19]

Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W

Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, and Donald Metzler. 2022. Transformer Memory as a Differentiable Search Index. arXiv:2202.06991 doi:10.48550/arXiv.2202.06991

work page doi:10.48550/arxiv.2202.06991 2022

[20] [20]

Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. arXiv:2104.08663 doi:10.48550/arXiv.2104.08663

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2104.08663 2021

[21] [21]

Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & Cross Network for Ad Click Predictions. InProceedings of the ADKDD’17. doi:10.1145/3124749. 3124754

work page doi:10.1145/3124749 2017

[22] [22]

Fan Yang, Zheng Chen, Ziyan Jiang, Eunah Cho, Xiaojiang Huang, and Yan- bin Lu. 2023. PALR: Personalization Aware LLMs for Recommendation. arXiv:2305.07622 [cs.IR] https://arxiv.org/abs/2305.07622

work page arXiv 2023

[23] [23]

Zhengyi Yang, Jiancan Wu, Zhicai Wang, Xiang Wang, Yancheng Yuan, and Xiangnan He. 2023. Generate What You Prefer: Reshaping Sequential Recom- mendation via Guided Diffusion. arXiv:2310.20453 doi:10.48550/arXiv.2310.20453

work page doi:10.48550/arxiv.2310.20453 2023

[24] [24]

Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhao- jie Gong, Fangda Gu, Michael He, Yinghai Lu, and Yu Shi. 2024. Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations. arXiv:2402.17152 doi:10.48550/arXiv.2402.17152

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.17152 2024

[25] [25]

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou

work page

[26] [26]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models. arXiv:2506.05176 [cs.CL] https://arxiv.org/abs/2506.05176

work page internal anchor Pith review Pith/arXiv arXiv

[27] [27]

Guorui Zhou, Honghui Bao, Jiaming Huang, Jiaxin Deng, Jinghao Zhang, Junda She, Kuo Cai, Lejian Ren, Lu Ren, Qiang Luo, Qianqian Wang, Qigen Hu, Rongzhou Zhang, Ruiming Tang, Shiyao Wang, Wuchao Li, Xiangyu Wu, Xinchen Luo, Xingmei Wang, Yifei Hu, Yunfan Wu, Zhanyu Liu, Zhiyang Zhang, Zixing Zhang, Bo Chen, Bin Wen, Chaoyi Ma, Chengru Song, Chenglong Chu,...

work page arXiv 2025