pith. sign in

arxiv: 2508.18142 · v3 · submitted 2025-08-25 · 💻 cs.HC · cs.CY· cs.IR

Mirroring Users: Towards Building Preference-aligned User Simulator with User Feedback in Recommendation

Pith reviewed 2026-05-18 21:24 UTC · model grok-4.3

classification 💻 cs.HC cs.CYcs.IR
keywords user simulationrecommender systemslarge language modelspreference alignmentuser feedbackdata distillationfine-tuning
0
0 comments X

The pith

A two-phase framework generates rationales from user feedback and distills informative samples to fine-tune LLMs as preference-aligned simulators for recommender systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a method to convert extensive user feedback from recommender systems into high-quality training data for simulators. Large language models first create explanatory rationales that spell out the reasoning behind each feedback instance and reduce its ambiguity. Uncertainty estimation combined with behavior sampling then selects the clearest and most useful samples. Lightweight LLMs are fine-tuned on the resulting dataset together with the rationales. The approach aims to produce simulators that match human preferences more closely and supply clearer reasoning traces during recommender interactions.

Core claim

The framework constructs high-quality simulation data in two phases: LLMs generate decision-making processes as explanatory rationales on simulation samples to reduce ambiguity, after which data distillation based on uncertainty estimation and behavior sampling filters the most informative and denoised samples. Fine-tuning lightweight LLMs on this dataset together with the corresponding decision-making processes significantly boosts alignment with human preferences and the in-domain reasoning capabilities of the simulators, yielding more insightful and interpretable signals for recommender system interaction.

What carries the argument

The data construction framework that uses LLM-generated explanatory rationales followed by uncertainty-based distillation to turn raw user feedback into high-quality training data for user simulators.

If this is right

  • Fine-tuned simulators exhibit significantly improved alignment with human preferences.
  • The simulators gain stronger in-domain reasoning capabilities.
  • They deliver more insightful and interpretable signals for recommender system interactions.
  • The framework efficiently manages ambiguity, noise, and volume in user feedback data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The produced rationales could be surfaced directly to end users to explain why certain items are recommended.
  • Similar rationale-plus-filtering pipelines might improve user simulation in other interactive systems such as conversational agents.
  • Live deployment experiments could measure whether the better-aligned simulators lead to higher user satisfaction in actual recommender platforms.

Load-bearing premise

LLM-generated rationales and uncertainty-based filtering can reduce ambiguity and noise in user feedback without introducing new biases or losing key preference information.

What would settle it

A held-out test set of real user interactions where the fine-tuned simulators are asked to predict choices and rationales; higher agreement with actual human selections and rationales than baselines trained without the rationale or filtering steps would support the claim.

Figures

Figures reproduced from arXiv: 2508.18142 by Dongxia Wang, Huang Chen, Huizhong Guo, Jie Zhang, Tianjun Wei, Yingpeng Du, Zhu Sun.

Figure 1
Figure 1. Figure 1: Example of the constructed user simulation scene. These templates systematically convert the available user attributes and interacted item features within each dataset into a standardized plain text format, for example, "Age: 35-44, Occupation: Customer Service, User Interaction History: Crimson Tide (1995) Rating: 4/5 ..." (Please refer to [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Accuracy of user simulation on representative domains. Stronger models perform better. Experimental results consistently show that LLMs with more advanced foun￾dational capabilities achieve higher alignment with real user behaviors. Since these LLMs have not been fine￾tuned on domain-specific user feedback in RSs, their more human-like behaviors likely stem from their robust context modeling and reasoning … view at source ↗
Figure 3
Figure 3. Figure 3: Results of uncertain decomposition on user simulation scenes. Decision-process Generation. Following this idea, we aim to decompose the uncertainty lied in user be￾havior simulation. When a human user provides feed￾back, there is always an underlying decision-making process. We believe that a complete decision-making process can play a "clarifying" role in user simu￾lation. We adopt the widely researched c… view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of our proposed USERMIRRORER framework. 1. Simulation Scene Construction: Randomly sampling from raw user feedback to construct a batch of user simulation scenes. 2. Decision Process Generation: Using LLM A and B with different capabilities to generate N decision processes with predicted behaviors for each sample. 3. Uncertainty-based Scene Distillation: Calculate ∆EU (X,(A, B)) via Equation 3… view at source ↗
Figure 5
Figure 5. Figure 5: Effect of training dataset size on the performance of user simulator. Thematic Preferences Availability Need For Achievement Past Experience Location Curiosity Time Of Day Emotional State Social Factors Boredom Top 10 Stimulus Factors 0.0 0.2 0.4 0.6 Accuracy Logical Intuitive Evaluation Factors (a) Content 5 10 15 Num. of Factors 0.0 0.5 1.0 Accuracy Factors Stimulus Knowledge (b) Number [PITH_FULL_IMAGE… view at source ↗
Figure 8
Figure 8. Figure 8: An overview of the three factor categories: knowledge, stimulus, and evaluation factors. [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
read the original abstract

User simulation is increasingly vital to develop and evaluate recommender systems (RSs). While Large Language Models (LLMs) offer promising avenues to simulate user behavior, they often struggle with the absence of specific task alignment required for RSs and the efficiency demands of large-scale simulation. A vast yet underutilized resource for enhancing this alignment is the extensive user feedback inherent in RSs, but leveraging it is challenging due to its ambiguity, noise and massive volume, which hinders efficient preference alignment. To overcome these hurdles, we introduce a novel data construction framework that leverages user feedback in RSs with advanced LLM capabilities to generate high-quality simulation data. Our framework unfolds in two key phases: (1) using LLMs to generate decision-making processes as explanatory rationales on simulation samples, thereby reducing ambiguity; and (2) data distillation based on uncertainty estimation and behavior sampling to efficiently filter the most informative, denoised samples. Accordingly, we fine-tune lightweight LLMs, as user simulators, using such high-quality dataset with corresponding decision-making processes. Extensive experiments confirm that our framework significantly boosts the alignment with human preferences and the in-domain reasoning capabilities of the fine-tuned LLMs, providing more insightful and interpretable signals for RS interaction. We believe our work, together with publicly available developed framework, high-quality mixed-domain dataset, and fine-tuned LLM checkpoints, will advance the RS community and offer valuable insights for broader human-centric AI research. Our code is available at https://github.com/Joinn99/UserMirrorer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a two-phase data construction framework for building preference-aligned user simulators in recommender systems. Phase 1 prompts LLMs to generate explanatory rationales for simulation samples drawn from user feedback to reduce ambiguity and noise. Phase 2 applies uncertainty estimation and behavior sampling to distill informative samples. Lightweight LLMs are then fine-tuned on the resulting dataset (with rationales) to serve as simulators. The abstract states that extensive experiments confirm significant improvements in human preference alignment and in-domain reasoning capabilities, with public release of code, dataset, and checkpoints.

Significance. If the generated rationales faithfully recover latent user preferences rather than LLM priors, the framework could offer a practical method for leveraging large-scale, noisy RS feedback to create more aligned and interpretable simulators. The public artifacts strengthen potential impact for the RS and human-centric AI communities.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (framework description): the central claim that LLM-generated rationales reduce ambiguity and improve alignment rests on the unverified assumption that these rationales surface actual user decision factors. No human validation, inter-annotator agreement, or rationale-only vs. feedback-only ablation is described, leaving open the risk that rationales inject model priors instead of recovering user preferences.
  2. [§4] §4 (experiments): the assertion of 'significant boosts' in alignment and reasoning is presented without reported quantitative results, specific metrics, baseline comparisons, or statistical tests in the provided abstract and summary. This makes independent verification of the load-bearing experimental support impossible from the manuscript details given.
minor comments (2)
  1. [§3.1] Clarify the exact prompting strategy and temperature settings used for rationale generation in Phase 1 to allow reproducibility.
  2. [§3.2] The uncertainty estimation method in Phase 2 should specify the exact formulation (e.g., entropy over what distribution) and any thresholds applied.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our work. We address each major point below and describe the changes we will make in the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (framework description): the central claim that LLM-generated rationales reduce ambiguity and improve alignment rests on the unverified assumption that these rationales surface actual user decision factors. No human validation, inter-annotator agreement, or rationale-only vs. feedback-only ablation is described, leaving open the risk that rationales inject model priors instead of recovering user preferences.

    Authors: We acknowledge the validity of this concern. The framework description in §3 motivates rationale generation as a means to reduce ambiguity in user feedback, but we agree that the claim would be strengthened by direct evidence that the rationales recover user preferences rather than LLM priors. In the revision we will add a human evaluation study in which multiple annotators rate the fidelity of generated rationales to the original feedback, report inter-annotator agreement, and include an ablation that compares simulator performance when trained on rationale-augmented data versus raw feedback only. revision: yes

  2. Referee: [§4] §4 (experiments): the assertion of 'significant boosts' in alignment and reasoning is presented without reported quantitative results, specific metrics, baseline comparisons, or statistical tests in the provided abstract and summary. This makes independent verification of the load-bearing experimental support impossible from the manuscript details given.

    Authors: We apologize that the excerpt supplied to the referee did not surface the quantitative details already present in §4. The full experimental section reports concrete metrics for preference alignment and reasoning quality, direct comparisons against several baselines, and statistical significance testing. To improve accessibility we will revise the abstract to include the key numerical results and add explicit pointers from the abstract to the corresponding tables and statistical analyses in §4. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes a two-phase empirical framework that ingests external user feedback from RSs, prompts LLMs to produce explanatory rationales, applies uncertainty-based filtering, and fine-tunes lightweight LLMs as simulators. All load-bearing steps rely on observable external data and standard LLM capabilities rather than self-definitional loops, fitted parameters renamed as predictions, or self-citation chains that substitute for independent justification. Claims of improved human-preference alignment are presented as experimental outcomes, not as mathematical identities derived from the inputs themselves. The approach is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on the abstract alone, no explicit free parameters, domain axioms, or invented entities are identified; the work builds on standard LLM fine-tuning and data processing techniques.

pith-pipeline@v0.9.0 · 5824 in / 1121 out tokens · 43390 ms · 2026-05-18T21:24:39.274976+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Our framework unfolds in two key phases: (1) using LLMs to generate decision-making processes as explanatory rationales on simulation samples... (2) data distillation based on uncertainty estimation and behavior sampling

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Through Their Eyes: Fixation-aligned Tuning for Personalized User Emulation

    cs.MM 2026-04 unverdicted novelty 6.0

    Personalized soft prompts steer VLM attention to match user-specific gaze patterns, yielding better attention alignment and click prediction in recommendation simulations.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · cited by 1 Pith paper · 8 internal anchors

  1. [1]

    Adomavicius and A

    G. Adomavicius and A. Tuzhilin. 2005. Toward the next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Transactions on Knowledge and Data Engineering 17, 6 (June 2005), 734–749. https://doi.org/10.1109/TKDE.2005.99

  2. [2]

    Ellis, Brian Whitman, and Paul Lamere

    Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. 2011. The Million Song Dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011)

  3. [3]

    Shijie Chen, Bernal Jimenez Gutierrez, and Yu Su. 2025. Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers. In The Thirteenth International Conference on Learning Representations

  4. [4]

    Lanzendörfer, Flint Xiaofeng Fan, and Roger Wattenhofer

    Nathan Corecco, Giorgio Piatti, Luca A. Lanzendörfer, Flint Xiaofeng Fan, and Roger Wattenhofer. 2024. SUBER: An RL Environment with Simulated Human Behavior for Recommender Systems. InProceedings of the 27th European Conference on Artificial Intelligence (ECAI 2024)

  5. [5]

    DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai D...

  6. [6]

    Alex Deng, Jiannan Lu, and Jonthan Litz. 2017. Trustworthy Analysis of Online A/B Tests: Pitfalls, Challenges and Solutions. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM ’17). Association for Computing Machinery, New York, NY , USA, 641–649. https://doi.org/10.1145/3018661.3018677

  7. [7]

    Jiaxin Deng, Shiyao Wang, Kuo Cai, Lejian Ren, Qigen Hu, Weifeng Ding, Qiang Luo, and Guorui Zhou

  8. [8]

    OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment

    OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment. https://doi.org/10.48550/arXiv.2502.18965 arXiv:2502.18965 [cs]

  9. [9]

    Mukund Deshpande and George Karypis. 2004. Item-Based Top-N Recommendation Algorithms. ACM Trans. Inf. Syst. 22, 1 (Jan. 2004), 143–177. https://doi.org/10.1145/963770.963776

  10. [10]

    Yingpeng Du, Zhu Sun, Ziyan Wang, Haoyan Chua, Jie Zhang, and Yew-Soon Ong. 2025. Active Large Language Model-Based Knowledge Distillation for Session-Based Recommendation. Proceedings of the AAAI Conference on Artificial Intelligence 39, 11 (Apr. 2025), 11607–11615. https://doi.org/10. 1609/aaai.v39i11.33263

  11. [11]

    Yingpeng Du, Ziyan Wang, Zhu Sun, Yining Ma, Hongzhi Liu, and Jie Zhang. 2024. Disentangled Multi-interest Representation Learning for Sequential Recommendation. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24). Association for Computing Machinery, New York, NY , USA, 677–688. https://doi.org/10.1145/363752...

  12. [12]

    Yingpeng Du, Tianjun Wei, Zhu Sun, and Jie Zhang. 2025. Reinforcement Speculative Decoding for Fast Ranking. arXiv:2505.20316 [cs.AI] https://arxiv.org/abs/2505.20316

  13. [13]

    Engel, R.D

    J.F. Engel, R.D. Blackwell, and D.T. Kollat. 1978. Consumer Behavior. Dryden Press

  14. [14]

    Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of The 33rd International Conference on Machine Learning. PMLR, 1050–1059

  15. [15]

    Chen Gao, Xiaochong Lan, Zhihong Lu, Jinzhu Mao, Jinghua Piao, Huandong Wang, Depeng Jin, and Yong Li. 2023. S3: Social-network Simulation System with Large Language Model-Empowered Agents. https://doi.org/10.48550/arXiv.2307.14984 arXiv:2307.14984

  16. [16]

    Chongming Gao, Shijun Li, Wenqiang Lei, Jiawei Chen, Biao Li, Peng Jiang, Xiangnan He, Jiaxin Mao, and Tat-Seng Chua. 2022. KuaiRec: A fully-observed dataset and insights for evaluating recommender systems. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management . 540–550

  17. [17]

    Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dollé. 2018. Offline A/B Testing for Recommender Systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM ’18). Association for Computing Machinery, New York, NY , USA, 198–206. https://doi.org/10.1145/3159652.3159687

  18. [18]

    F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4 (2015), 1–19

  19. [19]

    Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, YongDong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20). Association for Computing Machinery, New York, NY , USA, 639–648. h...

  20. [20]

    Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173–182

  21. [21]

    Bairu Hou, Yujian Liu, Kaizhi Qian, Jacob Andreas, Shiyu Chang, and Yang Zhang. 2024. Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling. In Proceedings of the 41st International Conference on Machine Learning (ICML’24, Vol. 235). JMLR.org, Vienna, Austria, 19023–19042

  22. [22]

    Eugene Ie, Chih-wei Hsu, Martin Mladenov, Vihan Jain, Sanmit Narvekar, Jing Wang, Rui Wu, and Craig Boutilier. 2019. RecSim: A Configurable Simulation Platform for Recommender Systems. https: //doi.org/10.48550/arXiv.1909.04847 arXiv:1909.04847 [cs, stat]

  23. [23]

    Yiqiao Jin, Qinlin Zhao, Yiyang Wang, Hao Chen, Kaijie Zhu, Yijia Xiao, and Jindong Wang. 2024. AgentReview: Exploring Peer Review Dynamics with LLM Agents. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, Miami, Fl...

  24. [24]

    Daniel Kahneman. 2011. Thinking, Fast and Slow. Farrar, Straus and Giroux, New York, NY , US. 499 pages

  25. [25]

    Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Recommendation. In 2018 IEEE International Conference on Data Mining (ICDM). 197–206. https://doi.org/10.1109/ICDM.2018. 00035

  26. [26]

    Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D

    Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James V . Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Chris Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wang, Pradeep Dasigi, and Hannaneh Hajishirz...

  27. [27]

    Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural Attentive Session-based Recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (Singapore, Singapore) (CIKM ’17). Association for Computing Machinery, New York, NY , USA, 1419–1428. https://doi.org/10.1145/3132847.3132926 12

  28. [28]

    Ming Li, Yong Zhang, Shwai He, Zhitao Li, Hongyu Zhao, Jianzong Wang, Ning Cheng, and Tianyi Zhou

  29. [29]

    In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.)

    Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, Bangkok, Thailand, 14255–14273. https://doi.org/10.18653/v1/2024.acl-long.769

  30. [30]

    Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. 2023. Towards General Text Embeddings with Multi-stage Contrastive Learning. https://doi.org/10.48550/arXiv. 2308.03281 arXiv:2308.03281 [cs]

  31. [31]

    Dawen Liang, Rahul G Krishnan, Matthew D Hoffman, and Tony Jebara. 2018. Variational autoencoders for collaborative filtering. In Proceedings of the 2018 world wide web conference. 689–698

  32. [32]

    Xufang Luo, Zheng Liu, Shitao Xiao, Xing Xie, and Dongsheng Li. 2022. MINDSim: User Simulator for News Recommenders. In Proceedings of the ACM Web Conference 2022 (WWW ’22). Association for Computing Machinery, New York, NY , USA, 2067–2077. https://doi.org/10.1145/3485447. 3512080

  33. [33]

    Kelong Mao, Jieming Zhu, Jinpeng Wang, Quanyu Dai, Zhenhua Dong, Xi Xiao, and Xiuqiang He

  34. [34]

    In Proceedings of the 30th ACM international conference on information & knowledge management

    SimpleX: A simple and strong baseline for collaborative filtering. In Proceedings of the 30th ACM international conference on information & knowledge management. 1243–1252

  35. [35]

    Muhammad Hasan Maqbool, Umar Farooq, Adib Mosharrof, AB Siddique, and Hassan Foroosh. 2023. MobileRec: A large scale dataset for mobile apps recommendation. InProceedings of the 46th international ACM SIGIR conference on research and development in information retrieval. 3007–3016

  36. [36]

    Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP- IJCNLP). 188–197

  37. [37]

    Mark O’Neill, Elham Vaziripour, Justin Wu, and Daniel Zappala. 2016. Condensing steam: Distilling the diversity of gamer behavior. In Proceedings of the 2016 internet measurement conference. 81–95

  38. [38]

    Fernando Benjamin Perez Maurera, Maurizio Ferrari Dacrema, Pablo Castells, and Paolo Cremonesi

  39. [39]

    ACM Trans

    Impression-Aware Recommender Systems. ACM Trans. Recomm. Syst. (Jan. 2025). https: //doi.org/10.1145/3712292

  40. [40]

    Qwen, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Ti...

  41. [41]

    Direct Preference Optimization: Your Language Model is Secretly a Reward Model

    Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. 2024. Direct Preference Optimization: Your Language Model Is Secretly a Reward Model. https: //doi.org/10.48550/arXiv.2305.18290 arXiv:2305.18290 [cs]

  42. [42]

    Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of the Twenty-Fifth Conference on Uncer- tainty in Artificial Intelligence (UAI ’09). AUAI Press, Arlington, Virginia, USA, 452–461

  43. [43]

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathemat- ical Reasoning in Open Language Models. https://doi.org/10.48550/arXiv.2402.03300 arXiv:2402.03300 [cs]

  44. [44]

    Yifei Shen, Yongji Wu, Yao Zhang, Caihua Shan, Jun Zhang, B Khaled Letaief, and Dongsheng Li. 2021. How powerful is graph convolution for recommendation?. In Proceedings of the 30th ACM international conference on information & knowledge management. 1619–1629

  45. [45]

    Elizaveta Stavinova, Alexander Grigorievskiy, Anna V olodkevich, Petr Chunaev, Klavdiya Bochenina, and Dmitry Bugaychenko. 2022. Synthetic Data-Based Simulators for Recommender Systems: A Survey. https://doi.org/10.48550/arXiv.2206.11338 arXiv:2206.11338 [cs] 13

  46. [46]

    Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, and Zhaochun Ren. 2023. Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for C...

  47. [47]

    Zhu Sun, Di Yu, Hui Fang, Jie Yang, Xinghua Qu, Jie Zhang, and Cong Geng. 2020. Are We Evaluating Rigorously? Benchmarking Recommendation for Reproducible Evaluation and Fair Comparison. In Proceedings of the 14th ACM Conference on Recommender Systems (Virtual Event, Brazil) (RecSys ’20). Association for Computing Machinery, New York, NY , USA, 23–32. htt...

  48. [48]

    Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, ...

  49. [49]

    Mengting Wan and Julian McAuley. 2018. Item recommendation on monotonic behavior chains. In Proceedings of the 12th ACM Conference on Recommender Systems (Vancouver, British Columbia, Canada) (RecSys ’18). Association for Computing Machinery, New York, NY , USA, 86–94. https: //doi.org/10.1145/3240323.3240369

  50. [50]

    Lei Wang, Jingsen Zhang, Hao Yang, Zhi-Yuan Chen, Jiakai Tang, Zeyu Zhang, Xu Chen, Yankai Lin, Hao Sun, Ruihua Song, Xin Zhao, Jun Xu, Zhicheng Dou, Jun Wang, and Ji-Rong Wen. 2025. User Behavior Simulation with Large Language Model-based Agents. ACM Trans. Inf. Syst.43, 2 (Jan. 2025), 55:1–55:37. https://doi.org/10.1145/3708985

  51. [51]

    Wenjie Wang, Yiyan Xu, Fuli Feng, Xinyu Lin, Xiangnan He, and Tat-Seng Chua. 2023. Diffusion Recommender Model. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23). Association for Computing Machinery, New York, NY , USA, 832–841. https://doi.org/10.1145/3539618.3591663

  52. [52]

    Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval. 165–174

  53. [53]

    Yancheng Wang, Ziyan Jiang, Zheng Chen, Fan Yang, Yingxue Zhou, Eunah Cho, Xing Fan, Yanbin Lu, Xiaojiang Huang, and Yingzhen Yang. 2024. RecMind: Large Language Model Powered Agent For 14 Recommendation. In Findings of the Association for Computational Linguistics: NAACL 2024 , Kevin Duh, Helena Gomez, and Steven Bethard (Eds.). Association for Computati...

  54. [54]

    Yifan Wang, Weizhi Ma, Min Zhang, Yiqun Liu, and Shaoping Ma. 2023. A Survey on the Fairness of Recommender Systems. ACM Trans. Inf. Syst.41, 3 (Feb. 2023), 52:1–52:43. https://doi.org/10. 1145/3547333

  55. [55]

    Zhenduo Wang, Zhichao Xu, Vivek Srikumar, and Qingyao Ai. 2024. An In-depth Investigation of User Response Simulation for Conversational Search. In Proceedings of the ACM Web Conference 2024 (WWW ’24). Association for Computing Machinery, New York, NY , USA, 1407–1418. https://doi.org/10. 1145/3589334.3645447

  56. [56]

    Tianjun Wei, Tommy W. S. Chow, and Jianghong Ma. 2024. FPSR+: Toward Robust, Efficient, and Scalable Collaborative Filtering With Partition-Aware Item Similarity Modeling. IEEE Transactions on Knowledge and Data Engineering 36, 12 (Dec. 2024), 8283–8296. https://doi.org/10.1109/TKDE. 2024.3418080

  57. [57]

    Tianjun Wei, Tommy W. S. Chow, and Jianghong Ma. 2024. FPSR+: Toward Robust, Efficient, and Scalable Collaborative Filtering With Partition-Aware Item Similarity Modeling. IEEE Transactions on Knowledge and Data Engineering 36, 12 (2024), 8283–8296. https://doi.org/10.1109/TKDE.2024.3418080

  58. [58]

    Tianjun Wei, Jianghong Ma, and Tommy W. S. Chow. 2023. Fine-tuning Partition-aware Item Similarities for Efficient and Scalable Recommendation. In Proceedings of the ACM Web Conference 2023(Austin, TX, USA) (WWW ’23). Association for Computing Machinery, New York, NY , USA, 823–832. https: //doi.org/10.1145/3543507.3583240

  59. [59]

    Wei Wei, Quoc Le, Andrew Dai, and Jia Li. 2018. AirDialogue: An Environment for Goal-Oriented Dialogue Research. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). Association for Computational Linguistics, Brussels, Belgium, 3844–3854. https...

  60. [60]

    Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, and Ming Zhou. 2020. MIND: A Large-scale Dataset for News Recommendation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.)....

  61. [61]

    Jiancan Wu, Xiang Wang, Fuli Feng, Xiangnan He, Liang Chen, Jianxun Lian, and Xing Xie. 2021. Self- Supervised Graph Learning for Recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’21). Association for Computing Machinery, New York, NY , USA, 726–735. https://doi.org/1...

  62. [62]

    Xiwang Yang, Harald Steck, Yang Guo, and Yong Liu. 2012. On Top-k Recommendation Using Social Net- works. In Proceedings of the Sixth ACM Conference on Recommender Systems (RecSys ’12). Association for Computing Machinery, New York, NY , USA, 67–74. https://doi.org/10.1145/2365952.2365969

  63. [63]

    An Zhang, Yuxin Chen, Leheng Sheng, Xiang Wang, and Tat-Seng Chua. 2024. On Generative Agents in Recommendation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). Association for Computing Machinery, New York, NY , USA, 1807–1817. https://doi.org/10.1145/3626772.3657844

  64. [64]

    Erhan Zhang, Xingzhu Wang, Peiyuan Gong, Yankai Lin, and Jiaxin Mao. 2024. USimAgent: Large Language Models for Simulating Search Users. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). Association for Computing Machinery, New York, NY , USA, 2687–2692. https://doi.org/10.1145/...

  65. [65]

    Junjie Zhang, Yupeng Hou, Ruobing Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu Lin, and Ji- Rong Wen. 2024. AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems. In Proceedings of the ACM Web Conference 2024 (WWW ’24) . Association for Computing Machinery, New York, NY , USA, 3679–3689. https://doi.org/10.1145/35...

  66. [66]

    Zijian Zhang, Shuchang Liu, Ziru Liu, Rui Zhong, Qingpeng Cai, Xiangyu Zhao, Chunxu Zhang, Qidong Liu, and Peng Jiang. 2025. LLM-Powered User Simulator for Recommender System. In Proceedings of the Thirty-Four International Joint Conference on Artificial Intelligence (AAAI ’25). https://doi.org/ 10.48550/arXiv.2412.16984 arXiv:2412.16984 [cs] 15

  67. [67]

    Kesen Zhao, Shuchang Liu, Qingpeng Cai, Xiangyu Zhao, Ziru Liu, Dong Zheng, Peng Jiang, and Kun Gai. 2023. KuaiSim: A Comprehensive Simulator for Recommender Systems. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS ’23). Curran Associates Inc., Red Hook, NY , USA, 44880–44897

  68. [68]

    Wayne Xin Zhao, Shanlei Mu, Yupeng Hou, Zihan Lin, Yushuo Chen, Xingyu Pan, Kaiyuan Li, Yujie Lu, Hui Wang, Changxin Tian, Yingqian Min, Zhichao Feng, Xinyan Fan, Xu Chen, Pengfei Wang, Wendi Ji, Yaliang Li, Xiaoling Wang, and Ji-Rong Wen. 2021. RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms. In CIKM. ACM, ...

  69. [69]

    Wayne Xin Zhao, Shanlei Mu, Yupeng Hou, Zihan Lin, Yushuo Chen, Xingyu Pan, Kaiyuan Li, Yujie Lu, Hui Wang, Changxin Tian, Yingqian Min, Zhichao Feng, Xinyan Fan, Xu Chen, Pengfei Wang, Wendi Ji, Yaliang Li, Xiaoling Wang, and Ji-Rong Wen. 2021. RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms. In Proceedings...

  70. [70]

    Behavior: [G]

    Unspecified settings follow the defaults of the torchtune and verl frameworks. We consider two training setups: • Single-Stage SFT: For models without the decision-making process, we apply supervised fine-tuning only, treating the task as single-token classification. • Two-Stage Fine-Tuning: For models incorporating decision-making, we perform a standard ...