pith. sign in

arxiv: 2606.12198 · v1 · pith:TS6D5DAJnew · submitted 2026-06-10 · 💻 cs.IR

LLM-Based User Personas for Recommendations at Scale

Pith reviewed 2026-06-27 08:05 UTC · model grok-4.3

classification 💻 cs.IR
keywords large language modelsuser personasrecommendation systemsreal-time inferenceexploitation-exploration tradeoffvideo recommendationsknowledge distillationA/B testing
0
0 comments X

The pith

A framework generates real-time LLM-based natural-language user personas to improve video recommendations at scale by balancing exploitation and exploration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows how to create natural-language descriptions of user interests using LLMs in real time for large-scale video recommendations. The personas summarize what users already like while adding new topics to encourage exploration. To handle the scale, the system uses knowledge distillation, asynchronous inference, and optimized inputs. Tests including live A/B experiments indicate better viewer value. Readers should care because it makes advanced language model capabilities practical for everyday recommendation systems without heavy offline computation.

Core claim

The paper establishes that real-time generation of LLM-based user interest personas, which combine summaries of existing interests with novel topics, can be achieved at billion-user scale through a cost-efficient architecture leveraging knowledge distillation, asynchronous inference, and semantically clustered video representations, resulting in significant improvements in viewer value as measured by offline evaluations, user studies, and live A/B tests.

What carries the argument

The real-time persona generation framework that addresses the exploitation-exploration trade-off directly during serving using LLMs.

If this is right

  • User profiles become more semantically rich and interpretable without relying on structured IDs.
  • Recommendations can adapt dynamically to new interests at serving time rather than offline.
  • The balance between known and novel content is handled within the persona itself.
  • Computational costs of LLM inference are mitigated for production environments serving billions of users.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may generalize to other platforms by replacing video clusters with domain-specific item representations.
  • Natural language personas could enable user-facing explanations or controls over their recommendation profiles.
  • Combining this with traditional ranking models might reduce reliance on separate diversity mechanisms.
  • Long-term effects on user engagement could be measured in extended A/B tests beyond immediate viewer value.

Load-bearing premise

That integrating these generated natural-language personas into the recommendation model produces measurable gains in viewer value at production scale.

What would settle it

An A/B test at scale where the LLM persona method shows no improvement or a decrease in key viewer value metrics compared to the existing system.

Figures

Figures reproduced from arXiv: 2606.12198 by Ben Most, Ed H. Chi, Fabio Soldo, Gregory Hinkson, Haokai Lu, Haoting Wang, Jenny Huang, Konstantina Christakopoulou, Lichan Hong, Minmin Chen, Nihar Bhupalam, Rein Zhang, Yifat Amir, Yixin Kelly Cui, Yu Xia, Zelong Zhao, Zheyun Feng.

Figure 1
Figure 1. Figure 1: Training Data Collection with Multi-step Reasoning [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Asynchronous online inference diagram Asynchronous LLM Inference The asynchronous online infer￾ence process (illustrated in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Offline evaluation: User representation To illustrate the impact of input structure, [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The proposed method drives viewer value. Y-axis [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

Large Language Models (LLMs) offer unprecedented potential for enhancing recommendation systems through their world knowledge and reasoning capabilities. However, existing approaches often rely on structured IDs or offline processing, limiting semantic richness, real-time adaptability, and user-facing interpretability. In this paper, we introduce a novel framework that enables real-time generation of LLM-based user interest personas for a large-scale commercial video recommendation platform. Our method generates natural-language user interest personas that address the exploitation-exploration trade-off by combining the summarization of existing interests with novel topics, directly during serving. To overcome the computational challenges of online LLM inference at a billion-user scale, we design a cost-efficient architecture leveraging knowledge distillation, asynchronous inference, and input optimization via semantically clustered video representations. Extensive offline evaluations, user studies, and live A/B tests demonstrate significant improvements in viewer value. This work bridges the gap between high-level semantic understanding and industrial-scale recommendation, paving the way for more dynamic, explainable, and satisfying personalized experiences.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The paper introduces a framework for real-time generation of natural-language user interest personas using LLMs for a large-scale video recommendation platform. The personas combine summarization of existing interests with novel topics to balance exploitation and exploration. To enable this at billion-user scale, the approach employs knowledge distillation, asynchronous inference, and semantically clustered video representations. The framework is evaluated through offline evaluations, user studies, and live A/B tests, which demonstrate significant improvements in viewer value.

Significance. If the empirical results hold, this work provides a practical bridge between LLM semantic capabilities and industrial-scale recommendation systems. It offers a cost-efficient architecture for real-time persona generation and integration, potentially improving personalization, interpretability, and user satisfaction. The multi-faceted evaluation (offline, user study, A/B) strengthens the claims of practical impact.

minor comments (1)
  1. [Abstract] Abstract: the claim of 'significant improvements in viewer value' is not accompanied by any quantitative metrics, baselines, or statistical details, even though the full manuscript supplies them; adding one sentence with key results would improve standalone readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and recommendation of minor revision. The assessment that the work provides a practical bridge between LLM capabilities and large-scale recommendations, supported by multi-faceted evaluation, is appreciated. No specific major comments were listed in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical framework for real-time LLM-generated natural-language user personas in a large-scale video recommendation system. It relies on architectural choices (distillation, async inference, clustered representations) and demonstrates gains via offline evaluations, user studies, and live A/B tests. No equations, derivations, parameter fittings, or self-referential definitions appear in the described method; the central claims rest on external empirical measurements rather than reducing to inputs by construction. No load-bearing self-citations or uniqueness theorems are invoked that collapse the argument.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the framework implicitly assumes LLMs can produce useful personas without detailing any fitted values or background assumptions.

pith-pipeline@v0.9.1-grok · 5756 in / 1053 out tokens · 20392 ms · 2026-06-27T08:05:50.585172+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 10 canonical work pages

  1. [1]

    Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. 2023. TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation. InProceedings of the 17th ACM Conference on Recommender Systems (RecSys ’23). ACM, 1007–1014. https://doi. org/10.1145/3604915.3608857

  2. [2]

    Moumita Bhattacharya, Vito Ostuni, and Sudarshan Lamkhede. 2024. Joint Mod- eling of Search and Recommendations Via an Unified Contextual Recommender (UniCoRn). arXiv:2408.10394 [cs.IR] https://arxiv.org/abs/2408.10394

  3. [3]

    Chi, and Minmin Chen

    Konstantina Christakopoulou, Alberto Lalama, Cj Adams, Iris Qu, Yifat Amir, Samer Chucri, Pierce Vollucci, Fabio Soldo, Dina Bseiso, Sarah Scodel, Lucas Dixon, Ed H. Chi, and Minmin Chen. 2023. Large Language Models for User Interest Journeys. arXiv:2305.15498 [cs.CL] https://arxiv.org/abs/2305.15498

  4. [4]

    Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. InProceedings of the 10th ACM Conference on Recommender Systems(Boston, Massachusetts, USA)(RecSys ’16). Association for Computing Machinery, New York, NY, USA, 191–198. https://doi.org/10.1145/ 2959100.2959190

  5. [5]

    Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Mur- phy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: a web-scale approach to probabilistic knowledge fusion. InProceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (New York, New York, USA)(KDD ’14). Association fo...

  6. [6]

    Francesco Fabbri, Gustavo Penha, Edoardo D’Amico, Alice Wang, Marco De Nadai, Jackie Doremus, Paul Gigioli, Andreas Damianou, Oskar Stål, and Mounia Lalmas

  7. [7]

    InProceedings of the Nineteenth ACM Conference on Recommender Systems (RecSys ’25)

    Evaluating Podcast Recommendations with Profile-Aware LLM-as-a-Judge. InProceedings of the Nineteenth ACM Conference on Recommender Systems (RecSys ’25). ACM, 1181–1186. https://doi.org/10.1145/3705328.3759305

  8. [8]

    Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang. 2023. Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5). arXiv:2203.13366 [cs.IR] https://arxiv.org/abs/ 2203.13366

  9. [9]

    Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating Large-Scale Inference with Anisotropic Vector Quantization. InInternational Conference on Machine Learning. https: //arxiv.org/abs/1908.10396

  10. [10]

    Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk

  11. [11]

    arXiv:1511.06939 [cs.LG] https://arxiv.org/abs/1511.06939

    Session-based Recommendations with Recurrent Neural Networks. arXiv:1511.06939 [cs.LG] https://arxiv.org/abs/1511.06939

  12. [12]

    Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. InProceedings of the 22nd ACM International Conference on Information & Knowledge Management(San Francisco, California, USA)(CIKM ’13). Association for Computing Machinery, New York, NY, USA, ...

  13. [13]

    Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Recom- mendation. arXiv:1808.09781 [cs.IR] https://arxiv.org/abs/1808.09781

  14. [14]

    Aditee Kumthekar, Li Wei, Andrea Bettale, Mahesh Sathiamoorthy, Zrinka Puljiz, and Aditya Mahajan. 2025. Never Miss an Episode: How LLMs are Powering Serial Content Discovery on YouTube. https://doi.org/10.1145/3705328.3748104

  15. [15]

    Yiqun Liu, Kaushik Rangadurai, Yunzhong He, Siddarth Malreddy, Xunlong Gui, Xiaoyi Liu, and Fedor Borisyuk. 2021. Que2Search: Fast and Accurate Query and Document Understanding for Search at Facebook. InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining(Virtual Event, Singapore)(KDD ’21). Association for Computing Machinery...

  16. [16]

    Boyuan Long, Yueqi Wang, Hiloni Mehta, Mick Zomnir, Omkar Pathak, Chang- ping Meng, Ruolin Jia, Yajun Peng, Dapeng Hong, Xia Wu, Mingyan Gao, Onkar Dalal, and Ningren Han. 2025. LLM-Powered Nuanced Video Attribute Annota- tion for Enhanced Recommendations. https://doi.org/10.1145/3705328.3748103

  17. [18]

    Changping Meng, Hongyi Ling, Jianling Wang, Yifan Liu, Shuzhou Zhang, Dapeng Hong, Mingyan Gao, Onkar Dalal, Ed Chi, Lichan Hong, Haokai Lu, and Ningren Han. 2025. Balancing Fine-tuning and RAG: A Hybrid Strat- egy for Dynamic LLM Recommendation Updates. InProceedings of the Nine- teenth ACM Conference on Recommender Systems (RecSys ’25). ACM, 919–922. ht...

  18. [19]

    Anand Rajagopalan, Fabio Vitale, Danny Vainstein, Gui Citovsky, Cecilia M Procopiuc, and Claudio Gentile. 2021. Hierarchical Clustering of Data Streams: Scalable Algorithms and Approximation Guarantees. InProceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (...

  19. [20]

    Thibault Sellam, Dipanjan Das, and Ankur P. Parikh. 2020. BLEURT: Learning Robust Metrics for Text Generation. arXiv:2004.04696 [cs.CL] https://arxiv.org/ abs/2004.04696

  20. [21]

    Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential Recommendation with Bidirectional En- coder Representations from Transformer. arXiv:1904.06690 [cs.IR] https: //arxiv.org/abs/1904.06690

  21. [22]

    Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, et al. 2024. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.arXiv preprint arXiv:2403.05530(2024)

  22. [23]

    Gomez, Lukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2023. Attention Is All You Need. arXiv:1706.03762 [cs.CL] https://arxiv.org/abs/1706.03762

  23. [24]

    Chi, Lichan Hong, and Haokai Lu

    Haoting Wang, Jianling Wang, Hao Li, Fangjun Yi, Mengyu Fu, Youwei Zhang, Yifan Liu, Liang Liu, Minmin Chen, Ed H. Chi, Lichan Hong, and Haokai Lu. 2025. Serendipitous Recommendation with Multimodal LLM. (2025). arXiv:2506.08283 [cs.IR] https://arxiv.org/abs/2506.08283

  24. [25]

    Chi, Lichan Hong, Ningren Han, and Haokai Lu

    Jianling Wang, Yifan Liu, Yinghao Sun, Xuejian Ma, Yueqi Wang, He Ma, Zhengyang Su, Minmin Chen, Mingyan Gao, Onkar Dalal, Ed H. Chi, Lichan Hong, Ningren Han, and Haokai Lu. 2025. User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems. arXiv:2504.05522 [cs.IR] https://arxiv.org/abs/2504.05522

  25. [26]

    Jianling Wang, Haokai Lu, Yifan Liu, He Ma, Yueqi Wang, Yang Gu, Shuzhou Zhang, Ningren Han, Shuchao Bi, Lexi Baugher, et al. 2024. Llms for user interest exploration in large-scale recommendation systems. InRecSys

  26. [27]

    Jianling Wang, Haokai Lu, Yifan Liu, He Ma, Yueqi Wang, Yang Gu, Shuzhou Zhang, Ningren Han, Shuchao Bi, Lexi Baugher, Ed Chi, and Minmin Chen. 2024. LLMs for User Interest Exploration in Large-scale Recommendation Systems. arXiv:2405.16363 [cs.IR] https://arxiv.org/abs/2405.16363

  27. [28]

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903 [cs.CL] https: //arxiv.org/abs/2201.11903

  28. [29]

    Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-Based Recommendation with Graph Neural Networks.Proceedings of the AAAI Conference on Artificial Intelligence33, 01 (July 2019), 346–353. https: //doi.org/10.1609/aaai.v33i01.3301346

  29. [30]

    Fan Yang, Zheng Chen, Ziyan Jiang, Eunah Cho, Xiaojiang Huang, and Yan- bin Lu. 2023. PALR: Personalization Aware LLMs for Recommendation. arXiv:2305.07622 [cs.IR] https://arxiv.org/abs/2305.07622

  30. [31]

    Bruce Croft

    Hamed Zamani and W. Bruce Croft. 2018. Joint Modeling and Optimization of Search and Recommendation. arXiv:1807.05631 [cs.IR] https://arxiv.org/abs/ 1807.05631

  31. [32]

    Haiyuan Zhao, Lei Zhang, Jun Xu, Guohao Cai, Zhenhua Dong, and Ji-Rong Wen

  32. [33]

    Quiet Companionability Seeker

    Uncovering User Interest from Biased and Noised Watch Time in Video Recommendation. InProceedings of the 17th ACM Conference on Recommender Systems (RecSys ’23). ACM, 528–539. https://doi.org/10.1145/3604915.3608797