Mirroring Users: Towards Building Preference-aligned User Simulator with User Feedback in Recommendation

Dongxia Wang; Huang Chen; Huizhong Guo; Jie Zhang; Tianjun Wei; Yingpeng Du; Zhu Sun

arxiv: 2508.18142 · v3 · submitted 2025-08-25 · 💻 cs.HC · cs.CY· cs.IR

Mirroring Users: Towards Building Preference-aligned User Simulator with User Feedback in Recommendation

Tianjun Wei , Huizhong Guo , Yingpeng Du , Zhu Sun , Huang Chen , Dongxia Wang , Jie Zhang This is my paper

Pith reviewed 2026-05-18 21:24 UTC · model grok-4.3

classification 💻 cs.HC cs.CYcs.IR

keywords user simulationrecommender systemslarge language modelspreference alignmentuser feedbackdata distillationfine-tuning

0 comments

The pith

A two-phase framework generates rationales from user feedback and distills informative samples to fine-tune LLMs as preference-aligned simulators for recommender systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a method to convert extensive user feedback from recommender systems into high-quality training data for simulators. Large language models first create explanatory rationales that spell out the reasoning behind each feedback instance and reduce its ambiguity. Uncertainty estimation combined with behavior sampling then selects the clearest and most useful samples. Lightweight LLMs are fine-tuned on the resulting dataset together with the rationales. The approach aims to produce simulators that match human preferences more closely and supply clearer reasoning traces during recommender interactions.

Core claim

The framework constructs high-quality simulation data in two phases: LLMs generate decision-making processes as explanatory rationales on simulation samples to reduce ambiguity, after which data distillation based on uncertainty estimation and behavior sampling filters the most informative and denoised samples. Fine-tuning lightweight LLMs on this dataset together with the corresponding decision-making processes significantly boosts alignment with human preferences and the in-domain reasoning capabilities of the simulators, yielding more insightful and interpretable signals for recommender system interaction.

What carries the argument

The data construction framework that uses LLM-generated explanatory rationales followed by uncertainty-based distillation to turn raw user feedback into high-quality training data for user simulators.

If this is right

Fine-tuned simulators exhibit significantly improved alignment with human preferences.
The simulators gain stronger in-domain reasoning capabilities.
They deliver more insightful and interpretable signals for recommender system interactions.
The framework efficiently manages ambiguity, noise, and volume in user feedback data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The produced rationales could be surfaced directly to end users to explain why certain items are recommended.
Similar rationale-plus-filtering pipelines might improve user simulation in other interactive systems such as conversational agents.
Live deployment experiments could measure whether the better-aligned simulators lead to higher user satisfaction in actual recommender platforms.

Load-bearing premise

LLM-generated rationales and uncertainty-based filtering can reduce ambiguity and noise in user feedback without introducing new biases or losing key preference information.

What would settle it

A held-out test set of real user interactions where the fine-tuned simulators are asked to predict choices and rationales; higher agreement with actual human selections and rationales than baselines trained without the rationale or filtering steps would support the claim.

Figures

Figures reproduced from arXiv: 2508.18142 by Dongxia Wang, Huang Chen, Huizhong Guo, Jie Zhang, Tianjun Wei, Yingpeng Du, Zhu Sun.

**Figure 1.** Figure 1: Example of the constructed user simulation scene. These templates systematically convert the available user attributes and interacted item features within each dataset into a standardized plain text format, for example, "Age: 35-44, Occupation: Customer Service, User Interaction History: Crimson Tide (1995) Rating: 4/5 ..." (Please refer to [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Accuracy of user simulation on representative domains. Stronger models perform better. Experimental results consistently show that LLMs with more advanced foundational capabilities achieve higher alignment with real user behaviors. Since these LLMs have not been finetuned on domain-specific user feedback in RSs, their more human-like behaviors likely stem from their robust context modeling and reasoning … view at source ↗

**Figure 3.** Figure 3: Results of uncertain decomposition on user simulation scenes. Decision-process Generation. Following this idea, we aim to decompose the uncertainty lied in user behavior simulation. When a human user provides feedback, there is always an underlying decision-making process. We believe that a complete decision-making process can play a "clarifying" role in user simulation. We adopt the widely researched c… view at source ↗

**Figure 4.** Figure 4: Illustration of our proposed USERMIRRORER framework. 1. Simulation Scene Construction: Randomly sampling from raw user feedback to construct a batch of user simulation scenes. 2. Decision Process Generation: Using LLM A and B with different capabilities to generate N decision processes with predicted behaviors for each sample. 3. Uncertainty-based Scene Distillation: Calculate ∆EU (X,(A, B)) via Equation 3… view at source ↗

**Figure 5.** Figure 5: Effect of training dataset size on the performance of user simulator. Thematic Preferences Availability Need For Achievement Past Experience Location Curiosity Time Of Day Emotional State Social Factors Boredom Top 10 Stimulus Factors 0.0 0.2 0.4 0.6 Accuracy Logical Intuitive Evaluation Factors (a) Content 5 10 15 Num. of Factors 0.0 0.5 1.0 Accuracy Factors Stimulus Knowledge (b) Number [PITH_FULL_IMAGE… view at source ↗

**Figure 8.** Figure 8: An overview of the three factor categories: knowledge, stimulus, and evaluation factors. [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗

read the original abstract

User simulation is increasingly vital to develop and evaluate recommender systems (RSs). While Large Language Models (LLMs) offer promising avenues to simulate user behavior, they often struggle with the absence of specific task alignment required for RSs and the efficiency demands of large-scale simulation. A vast yet underutilized resource for enhancing this alignment is the extensive user feedback inherent in RSs, but leveraging it is challenging due to its ambiguity, noise and massive volume, which hinders efficient preference alignment. To overcome these hurdles, we introduce a novel data construction framework that leverages user feedback in RSs with advanced LLM capabilities to generate high-quality simulation data. Our framework unfolds in two key phases: (1) using LLMs to generate decision-making processes as explanatory rationales on simulation samples, thereby reducing ambiguity; and (2) data distillation based on uncertainty estimation and behavior sampling to efficiently filter the most informative, denoised samples. Accordingly, we fine-tune lightweight LLMs, as user simulators, using such high-quality dataset with corresponding decision-making processes. Extensive experiments confirm that our framework significantly boosts the alignment with human preferences and the in-domain reasoning capabilities of the fine-tuned LLMs, providing more insightful and interpretable signals for RS interaction. We believe our work, together with publicly available developed framework, high-quality mixed-domain dataset, and fine-tuned LLM checkpoints, will advance the RS community and offer valuable insights for broader human-centric AI research. Our code is available at https://github.com/Joinn99/UserMirrorer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable two-phase method to turn noisy user feedback into training data for preference-aligned simulators, but leaves the faithfulness of LLM rationales unverified.

read the letter

The main takeaway is a concrete framework that first prompts an LLM to produce decision-making rationales from user feedback samples in recommender systems, then applies uncertainty estimation and behavior sampling to distill a cleaner training set for fine-tuning lightweight LLMs as simulators. This targets the common problems of ambiguity and volume in real RS feedback data. They report that the resulting simulators show better alignment with human preferences and stronger in-domain reasoning than baselines. Releasing the code, a mixed-domain dataset, and the fine-tuned checkpoints is a practical plus that lets others test or extend the work directly. The combination of rationale generation followed by uncertainty filtering is presented as a distinct way to build these simulators, and it makes sense as a way to leverage existing feedback without starting from scratch. The experiments are described as confirming significant gains, which would matter for anyone running large-scale RS evaluations. On the downside, the central assumption that LLM-generated rationales faithfully surface user preferences rather than model priors still needs stronger checks. The abstract does not detail human validation, inter-rater agreement, or ablations that isolate the rationales' contribution, so it remains possible the gains come partly from LLM biases instead of recovered user signals. If the full paper includes those controls, the claim holds up better; otherwise it is a soft spot worth probing. This work is aimed at the recommender systems community and researchers building user simulators or human-centric AI tools. Readers who need ready-to-use simulators or want to experiment with feedback-driven distillation will find the artifacts useful. It deserves peer review because the problem is relevant, the method is implementable, and the releases support verification.

Referee Report

2 major / 2 minor

Summary. The paper introduces a two-phase data construction framework for building preference-aligned user simulators in recommender systems. Phase 1 prompts LLMs to generate explanatory rationales for simulation samples drawn from user feedback to reduce ambiguity and noise. Phase 2 applies uncertainty estimation and behavior sampling to distill informative samples. Lightweight LLMs are then fine-tuned on the resulting dataset (with rationales) to serve as simulators. The abstract states that extensive experiments confirm significant improvements in human preference alignment and in-domain reasoning capabilities, with public release of code, dataset, and checkpoints.

Significance. If the generated rationales faithfully recover latent user preferences rather than LLM priors, the framework could offer a practical method for leveraging large-scale, noisy RS feedback to create more aligned and interpretable simulators. The public artifacts strengthen potential impact for the RS and human-centric AI communities.

major comments (2)

[Abstract and §3] Abstract and §3 (framework description): the central claim that LLM-generated rationales reduce ambiguity and improve alignment rests on the unverified assumption that these rationales surface actual user decision factors. No human validation, inter-annotator agreement, or rationale-only vs. feedback-only ablation is described, leaving open the risk that rationales inject model priors instead of recovering user preferences.
[§4] §4 (experiments): the assertion of 'significant boosts' in alignment and reasoning is presented without reported quantitative results, specific metrics, baseline comparisons, or statistical tests in the provided abstract and summary. This makes independent verification of the load-bearing experimental support impossible from the manuscript details given.

minor comments (2)

[§3.1] Clarify the exact prompting strategy and temperature settings used for rationale generation in Phase 1 to allow reproducibility.
[§3.2] The uncertainty estimation method in Phase 2 should specify the exact formulation (e.g., entropy over what distribution) and any thresholds applied.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our work. We address each major point below and describe the changes we will make in the revised manuscript.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (framework description): the central claim that LLM-generated rationales reduce ambiguity and improve alignment rests on the unverified assumption that these rationales surface actual user decision factors. No human validation, inter-annotator agreement, or rationale-only vs. feedback-only ablation is described, leaving open the risk that rationales inject model priors instead of recovering user preferences.

Authors: We acknowledge the validity of this concern. The framework description in §3 motivates rationale generation as a means to reduce ambiguity in user feedback, but we agree that the claim would be strengthened by direct evidence that the rationales recover user preferences rather than LLM priors. In the revision we will add a human evaluation study in which multiple annotators rate the fidelity of generated rationales to the original feedback, report inter-annotator agreement, and include an ablation that compares simulator performance when trained on rationale-augmented data versus raw feedback only. revision: yes
Referee: [§4] §4 (experiments): the assertion of 'significant boosts' in alignment and reasoning is presented without reported quantitative results, specific metrics, baseline comparisons, or statistical tests in the provided abstract and summary. This makes independent verification of the load-bearing experimental support impossible from the manuscript details given.

Authors: We apologize that the excerpt supplied to the referee did not surface the quantitative details already present in §4. The full experimental section reports concrete metrics for preference alignment and reasoning quality, direct comparisons against several baselines, and statistical significance testing. To improve accessibility we will revise the abstract to include the key numerical results and add explicit pointers from the abstract to the corresponding tables and statistical analyses in §4. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes a two-phase empirical framework that ingests external user feedback from RSs, prompts LLMs to produce explanatory rationales, applies uncertainty-based filtering, and fine-tunes lightweight LLMs as simulators. All load-bearing steps rely on observable external data and standard LLM capabilities rather than self-definitional loops, fitted parameters renamed as predictions, or self-citation chains that substitute for independent justification. Claims of improved human-preference alignment are presented as experimental outcomes, not as mathematical identities derived from the inputs themselves. The approach is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on the abstract alone, no explicit free parameters, domain axioms, or invented entities are identified; the work builds on standard LLM fine-tuning and data processing techniques.

pith-pipeline@v0.9.0 · 5824 in / 1121 out tokens · 43390 ms · 2026-05-18T21:24:39.274976+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our framework unfolds in two key phases: (1) using LLMs to generate decision-making processes as explanatory rationales on simulation samples... (2) data distillation based on uncertainty estimation and behavior sampling

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Through Their Eyes: Fixation-aligned Tuning for Personalized User Emulation
cs.MM 2026-04 unverdicted novelty 6.0

Personalized soft prompts steer VLM attention to match user-specific gaze patterns, yielding better attention alignment and click prediction in recommendation simulations.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · cited by 1 Pith paper · 8 internal anchors

[1]

Adomavicius and A

G. Adomavicius and A. Tuzhilin. 2005. Toward the next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Transactions on Knowledge and Data Engineering 17, 6 (June 2005), 734–749. https://doi.org/10.1109/TKDE.2005.99

work page doi:10.1109/tkde.2005.99 2005
[2]

Ellis, Brian Whitman, and Paul Lamere

Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. 2011. The Million Song Dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011)

work page 2011
[3]

Shijie Chen, Bernal Jimenez Gutierrez, and Yu Su. 2025. Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers. In The Thirteenth International Conference on Learning Representations

work page 2025
[4]

Lanzendörfer, Flint Xiaofeng Fan, and Roger Wattenhofer

Nathan Corecco, Giorgio Piatti, Luca A. Lanzendörfer, Flint Xiaofeng Fan, and Roger Wattenhofer. 2024. SUBER: An RL Environment with Simulated Human Behavior for Recommender Systems. InProceedings of the 27th European Conference on Artificial Intelligence (ECAI 2024)

work page 2024
[5]

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai D...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948 2025
[6]

Alex Deng, Jiannan Lu, and Jonthan Litz. 2017. Trustworthy Analysis of Online A/B Tests: Pitfalls, Challenges and Solutions. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM ’17). Association for Computing Machinery, New York, NY , USA, 641–649. https://doi.org/10.1145/3018661.3018677

work page doi:10.1145/3018661.3018677 2017
[7]

Jiaxin Deng, Shiyao Wang, Kuo Cai, Lejian Ren, Qigen Hu, Weifeng Ding, Qiang Luo, and Guorui Zhou

work page
[8]

OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment

OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment. https://doi.org/10.48550/arXiv.2502.18965 arXiv:2502.18965 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.18965
[9]

Mukund Deshpande and George Karypis. 2004. Item-Based Top-N Recommendation Algorithms. ACM Trans. Inf. Syst. 22, 1 (Jan. 2004), 143–177. https://doi.org/10.1145/963770.963776

work page doi:10.1145/963770.963776 2004
[10]

Yingpeng Du, Zhu Sun, Ziyan Wang, Haoyan Chua, Jie Zhang, and Yew-Soon Ong. 2025. Active Large Language Model-Based Knowledge Distillation for Session-Based Recommendation. Proceedings of the AAAI Conference on Artificial Intelligence 39, 11 (Apr. 2025), 11607–11615. https://doi.org/10. 1609/aaai.v39i11.33263

work page 2025
[11]

Yingpeng Du, Ziyan Wang, Zhu Sun, Yining Ma, Hongzhi Liu, and Jie Zhang. 2024. Disentangled Multi-interest Representation Learning for Sequential Recommendation. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24). Association for Computing Machinery, New York, NY , USA, 677–688. https://doi.org/10.1145/363752...

work page doi:10.1145/3637528.3671800 2024
[12]

Yingpeng Du, Tianjun Wei, Zhu Sun, and Jie Zhang. 2025. Reinforcement Speculative Decoding for Fast Ranking. arXiv:2505.20316 [cs.AI] https://arxiv.org/abs/2505.20316

work page arXiv 2025
[13]

Engel, R.D

J.F. Engel, R.D. Blackwell, and D.T. Kollat. 1978. Consumer Behavior. Dryden Press

work page 1978
[14]

Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of The 33rd International Conference on Machine Learning. PMLR, 1050–1059

work page 2016
[15]

Chen Gao, Xiaochong Lan, Zhihong Lu, Jinzhu Mao, Jinghua Piao, Huandong Wang, Depeng Jin, and Yong Li. 2023. S3: Social-network Simulation System with Large Language Model-Empowered Agents. https://doi.org/10.48550/arXiv.2307.14984 arXiv:2307.14984

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.14984 2023
[16]

Chongming Gao, Shijun Li, Wenqiang Lei, Jiawei Chen, Biao Li, Peng Jiang, Xiangnan He, Jiaxin Mao, and Tat-Seng Chua. 2022. KuaiRec: A fully-observed dataset and insights for evaluating recommender systems. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management . 540–550

work page 2022
[17]

Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dollé. 2018. Offline A/B Testing for Recommender Systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM ’18). Association for Computing Machinery, New York, NY , USA, 198–206. https://doi.org/10.1145/3159652.3159687

work page doi:10.1145/3159652.3159687 2018
[18]

F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4 (2015), 1–19

work page 2015
[19]

Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, YongDong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20). Association for Computing Machinery, New York, NY , USA, 639–648. h...

work page arXiv 2020
[20]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173–182

work page 2017
[21]

Bairu Hou, Yujian Liu, Kaizhi Qian, Jacob Andreas, Shiyu Chang, and Yang Zhang. 2024. Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling. In Proceedings of the 41st International Conference on Machine Learning (ICML’24, Vol. 235). JMLR.org, Vienna, Austria, 19023–19042

work page 2024
[22]

Eugene Ie, Chih-wei Hsu, Martin Mladenov, Vihan Jain, Sanmit Narvekar, Jing Wang, Rui Wu, and Craig Boutilier. 2019. RecSim: A Configurable Simulation Platform for Recommender Systems. https: //doi.org/10.48550/arXiv.1909.04847 arXiv:1909.04847 [cs, stat]

work page doi:10.48550/arxiv.1909.04847 2019
[23]

Yiqiao Jin, Qinlin Zhao, Yiyang Wang, Hao Chen, Kaijie Zhu, Yijia Xiao, and Jindong Wang. 2024. AgentReview: Exploring Peer Review Dynamics with LLM Agents. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, Miami, Fl...

work page doi:10.18653/v1/2024.emnlp-main.70 2024
[24]

Daniel Kahneman. 2011. Thinking, Fast and Slow. Farrar, Straus and Giroux, New York, NY , US. 499 pages

work page 2011
[25]

Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Recommendation. In 2018 IEEE International Conference on Data Mining (ICDM). 197–206. https://doi.org/10.1109/ICDM.2018. 00035

work page doi:10.1109/icdm.2018 2018
[26]

Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D

Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James V . Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Chris Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wang, Pradeep Dasigi, and Hannaneh Hajishirz...

work page 2024
[27]

Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural Attentive Session-based Recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (Singapore, Singapore) (CIKM ’17). Association for Computing Machinery, New York, NY , USA, 1419–1428. https://doi.org/10.1145/3132847.3132926 12

work page doi:10.1145/3132847.3132926 2017
[28]

Ming Li, Yong Zhang, Shwai He, Zhitao Li, Hongyu Zhao, Jianzong Wang, Ning Cheng, and Tianyi Zhou

work page
[29]

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.)

Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, Bangkok, Thailand, 14255–14273. https://doi.org/10.18653/v1/2024.acl-long.769

work page doi:10.18653/v1/2024.acl-long.769 2024
[30]

Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. 2023. Towards General Text Embeddings with Multi-stage Contrastive Learning. https://doi.org/10.48550/arXiv. 2308.03281 arXiv:2308.03281 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2023
[31]

Dawen Liang, Rahul G Krishnan, Matthew D Hoffman, and Tony Jebara. 2018. Variational autoencoders for collaborative filtering. In Proceedings of the 2018 world wide web conference. 689–698

work page 2018
[32]

Xufang Luo, Zheng Liu, Shitao Xiao, Xing Xie, and Dongsheng Li. 2022. MINDSim: User Simulator for News Recommenders. In Proceedings of the ACM Web Conference 2022 (WWW ’22). Association for Computing Machinery, New York, NY , USA, 2067–2077. https://doi.org/10.1145/3485447. 3512080

work page doi:10.1145/3485447 2022
[33]

Kelong Mao, Jieming Zhu, Jinpeng Wang, Quanyu Dai, Zhenhua Dong, Xi Xiao, and Xiuqiang He

work page
[34]

In Proceedings of the 30th ACM international conference on information & knowledge management

SimpleX: A simple and strong baseline for collaborative filtering. In Proceedings of the 30th ACM international conference on information & knowledge management. 1243–1252

work page
[35]

Muhammad Hasan Maqbool, Umar Farooq, Adib Mosharrof, AB Siddique, and Hassan Foroosh. 2023. MobileRec: A large scale dataset for mobile apps recommendation. InProceedings of the 46th international ACM SIGIR conference on research and development in information retrieval. 3007–3016

work page 2023
[36]

Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP- IJCNLP). 188–197

work page 2019
[37]

Mark O’Neill, Elham Vaziripour, Justin Wu, and Daniel Zappala. 2016. Condensing steam: Distilling the diversity of gamer behavior. In Proceedings of the 2016 internet measurement conference. 81–95

work page 2016
[38]

Fernando Benjamin Perez Maurera, Maurizio Ferrari Dacrema, Pablo Castells, and Paolo Cremonesi

work page
[39]

ACM Trans

Impression-Aware Recommender Systems. ACM Trans. Recomm. Syst. (Jan. 2025). https: //doi.org/10.1145/3712292

work page doi:10.1145/3712292 2025
[40]

Qwen, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Ti...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.15115 2025
[41]

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. 2024. Direct Preference Optimization: Your Language Model Is Secretly a Reward Model. https: //doi.org/10.48550/arXiv.2305.18290 arXiv:2305.18290 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.18290 2024
[42]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of the Twenty-Fifth Conference on Uncer- tainty in Artificial Intelligence (UAI ’09). AUAI Press, Arlington, Virginia, USA, 452–461

work page 2009
[43]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathemat- ical Reasoning in Open Language Models. https://doi.org/10.48550/arXiv.2402.03300 arXiv:2402.03300 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03300 2024
[44]

Yifei Shen, Yongji Wu, Yao Zhang, Caihua Shan, Jun Zhang, B Khaled Letaief, and Dongsheng Li. 2021. How powerful is graph convolution for recommendation?. In Proceedings of the 30th ACM international conference on information & knowledge management. 1619–1629

work page 2021
[45]

Elizaveta Stavinova, Alexander Grigorievskiy, Anna V olodkevich, Petr Chunaev, Klavdiya Bochenina, and Dmitry Bugaychenko. 2022. Synthetic Data-Based Simulators for Recommender Systems: A Survey. https://doi.org/10.48550/arXiv.2206.11338 arXiv:2206.11338 [cs] 13

work page doi:10.48550/arxiv.2206.11338 2022
[46]

Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, and Zhaochun Ren. 2023. Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for C...

work page doi:10.18653/v1/2023.emnlp-main.923 2023
[47]

Zhu Sun, Di Yu, Hui Fang, Jie Yang, Xinghua Qu, Jie Zhang, and Cong Geng. 2020. Are We Evaluating Rigorously? Benchmarking Recommendation for Reproducible Evaluation and Fair Comparison. In Proceedings of the 14th ACM Conference on Recommender Systems (Virtual Event, Brazil) (RecSys ’20). Association for Computing Machinery, New York, NY , USA, 23–32. htt...

work page arXiv 2020
[48]

Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2408.00118 2024
[49]

Mengting Wan and Julian McAuley. 2018. Item recommendation on monotonic behavior chains. In Proceedings of the 12th ACM Conference on Recommender Systems (Vancouver, British Columbia, Canada) (RecSys ’18). Association for Computing Machinery, New York, NY , USA, 86–94. https: //doi.org/10.1145/3240323.3240369

work page doi:10.1145/3240323.3240369 2018
[50]

Lei Wang, Jingsen Zhang, Hao Yang, Zhi-Yuan Chen, Jiakai Tang, Zeyu Zhang, Xu Chen, Yankai Lin, Hao Sun, Ruihua Song, Xin Zhao, Jun Xu, Zhicheng Dou, Jun Wang, and Ji-Rong Wen. 2025. User Behavior Simulation with Large Language Model-based Agents. ACM Trans. Inf. Syst.43, 2 (Jan. 2025), 55:1–55:37. https://doi.org/10.1145/3708985

work page doi:10.1145/3708985 2025
[51]

Wenjie Wang, Yiyan Xu, Fuli Feng, Xinyu Lin, Xiangnan He, and Tat-Seng Chua. 2023. Diffusion Recommender Model. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23). Association for Computing Machinery, New York, NY , USA, 832–841. https://doi.org/10.1145/3539618.3591663

work page doi:10.1145/3539618.3591663 2023
[52]

Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval. 165–174

work page 2019
[53]

Yancheng Wang, Ziyan Jiang, Zheng Chen, Fan Yang, Yingxue Zhou, Eunah Cho, Xing Fan, Yanbin Lu, Xiaojiang Huang, and Yingzhen Yang. 2024. RecMind: Large Language Model Powered Agent For 14 Recommendation. In Findings of the Association for Computational Linguistics: NAACL 2024 , Kevin Duh, Helena Gomez, and Steven Bethard (Eds.). Association for Computati...

work page doi:10.18653/v1/2024.findings-naacl.271 2024
[54]

Yifan Wang, Weizhi Ma, Min Zhang, Yiqun Liu, and Shaoping Ma. 2023. A Survey on the Fairness of Recommender Systems. ACM Trans. Inf. Syst.41, 3 (Feb. 2023), 52:1–52:43. https://doi.org/10. 1145/3547333

work page 2023
[55]

Zhenduo Wang, Zhichao Xu, Vivek Srikumar, and Qingyao Ai. 2024. An In-depth Investigation of User Response Simulation for Conversational Search. In Proceedings of the ACM Web Conference 2024 (WWW ’24). Association for Computing Machinery, New York, NY , USA, 1407–1418. https://doi.org/10. 1145/3589334.3645447

work page arXiv 2024
[56]

Tianjun Wei, Tommy W. S. Chow, and Jianghong Ma. 2024. FPSR+: Toward Robust, Efficient, and Scalable Collaborative Filtering With Partition-Aware Item Similarity Modeling. IEEE Transactions on Knowledge and Data Engineering 36, 12 (Dec. 2024), 8283–8296. https://doi.org/10.1109/TKDE. 2024.3418080

work page doi:10.1109/tkde 2024
[57]

Tianjun Wei, Tommy W. S. Chow, and Jianghong Ma. 2024. FPSR+: Toward Robust, Efficient, and Scalable Collaborative Filtering With Partition-Aware Item Similarity Modeling. IEEE Transactions on Knowledge and Data Engineering 36, 12 (2024), 8283–8296. https://doi.org/10.1109/TKDE.2024.3418080

work page doi:10.1109/tkde.2024.3418080 2024
[58]

Tianjun Wei, Jianghong Ma, and Tommy W. S. Chow. 2023. Fine-tuning Partition-aware Item Similarities for Efficient and Scalable Recommendation. In Proceedings of the ACM Web Conference 2023(Austin, TX, USA) (WWW ’23). Association for Computing Machinery, New York, NY , USA, 823–832. https: //doi.org/10.1145/3543507.3583240

work page doi:10.1145/3543507.3583240 2023
[59]

Wei Wei, Quoc Le, Andrew Dai, and Jia Li. 2018. AirDialogue: An Environment for Goal-Oriented Dialogue Research. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). Association for Computational Linguistics, Brussels, Belgium, 3844–3854. https...

work page doi:10.18653/v1/ 2018
[60]

Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, and Ming Zhou. 2020. MIND: A Large-scale Dataset for News Recommendation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.)....

work page doi:10.18653/v1/2020.acl-main.331 2020
[61]

Jiancan Wu, Xiang Wang, Fuli Feng, Xiangnan He, Liang Chen, Jianxun Lian, and Xing Xie. 2021. Self- Supervised Graph Learning for Recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’21). Association for Computing Machinery, New York, NY , USA, 726–735. https://doi.org/1...

work page doi:10.1145/3404835.3462862 2021
[62]

Xiwang Yang, Harald Steck, Yang Guo, and Yong Liu. 2012. On Top-k Recommendation Using Social Net- works. In Proceedings of the Sixth ACM Conference on Recommender Systems (RecSys ’12). Association for Computing Machinery, New York, NY , USA, 67–74. https://doi.org/10.1145/2365952.2365969

work page doi:10.1145/2365952.2365969 2012
[63]

An Zhang, Yuxin Chen, Leheng Sheng, Xiang Wang, and Tat-Seng Chua. 2024. On Generative Agents in Recommendation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). Association for Computing Machinery, New York, NY , USA, 1807–1817. https://doi.org/10.1145/3626772.3657844

work page doi:10.1145/3626772.3657844 2024
[64]

Erhan Zhang, Xingzhu Wang, Peiyuan Gong, Yankai Lin, and Jiaxin Mao. 2024. USimAgent: Large Language Models for Simulating Search Users. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). Association for Computing Machinery, New York, NY , USA, 2687–2692. https://doi.org/10.1145/...

work page doi:10.1145/3626772.3657963 2024
[65]

Junjie Zhang, Yupeng Hou, Ruobing Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu Lin, and Ji- Rong Wen. 2024. AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems. In Proceedings of the ACM Web Conference 2024 (WWW ’24) . Association for Computing Machinery, New York, NY , USA, 3679–3689. https://doi.org/10.1145/35...

work page doi:10.1145/3589334.3645537 2024
[66]

Zijian Zhang, Shuchang Liu, Ziru Liu, Rui Zhong, Qingpeng Cai, Xiangyu Zhao, Chunxu Zhang, Qidong Liu, and Peng Jiang. 2025. LLM-Powered User Simulator for Recommender System. In Proceedings of the Thirty-Four International Joint Conference on Artificial Intelligence (AAAI ’25). https://doi.org/ 10.48550/arXiv.2412.16984 arXiv:2412.16984 [cs] 15

work page doi:10.48550/arxiv.2412.16984 2025
[67]

Kesen Zhao, Shuchang Liu, Qingpeng Cai, Xiangyu Zhao, Ziru Liu, Dong Zheng, Peng Jiang, and Kun Gai. 2023. KuaiSim: A Comprehensive Simulator for Recommender Systems. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS ’23). Curran Associates Inc., Red Hook, NY , USA, 44880–44897

work page 2023
[68]

Wayne Xin Zhao, Shanlei Mu, Yupeng Hou, Zihan Lin, Yushuo Chen, Xingyu Pan, Kaiyuan Li, Yujie Lu, Hui Wang, Changxin Tian, Yingqian Min, Zhichao Feng, Xinyan Fan, Xu Chen, Pengfei Wang, Wendi Ji, Yaliang Li, Xiaoling Wang, and Ji-Rong Wen. 2021. RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms. In CIKM. ACM, ...

work page 2021
[69]

Wayne Xin Zhao, Shanlei Mu, Yupeng Hou, Zihan Lin, Yushuo Chen, Xingyu Pan, Kaiyuan Li, Yujie Lu, Hui Wang, Changxin Tian, Yingqian Min, Zhichao Feng, Xinyan Fan, Xu Chen, Pengfei Wang, Wendi Ji, Yaliang Li, Xiaoling Wang, and Ji-Rong Wen. 2021. RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms. In Proceedings...

work page doi:10.1145/3459637.3482016 2021
[70]

Behavior: [G]

Unspecified settings follow the defaults of the torchtune and verl frameworks. We consider two training setups: • Single-Stage SFT: For models without the decision-making process, we apply supervised fine-tuning only, treating the task as single-token classification. • Two-Stage Fine-Tuning: For models incorporating decision-making, we perform a standard ...

work page 1998

[1] [1]

Adomavicius and A

G. Adomavicius and A. Tuzhilin. 2005. Toward the next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Transactions on Knowledge and Data Engineering 17, 6 (June 2005), 734–749. https://doi.org/10.1109/TKDE.2005.99

work page doi:10.1109/tkde.2005.99 2005

[2] [2]

Ellis, Brian Whitman, and Paul Lamere

Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. 2011. The Million Song Dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011)

work page 2011

[3] [3]

Shijie Chen, Bernal Jimenez Gutierrez, and Yu Su. 2025. Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers. In The Thirteenth International Conference on Learning Representations

work page 2025

[4] [4]

Lanzendörfer, Flint Xiaofeng Fan, and Roger Wattenhofer

Nathan Corecco, Giorgio Piatti, Luca A. Lanzendörfer, Flint Xiaofeng Fan, and Roger Wattenhofer. 2024. SUBER: An RL Environment with Simulated Human Behavior for Recommender Systems. InProceedings of the 27th European Conference on Artificial Intelligence (ECAI 2024)

work page 2024

[5] [5]

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai D...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948 2025

[6] [6]

Alex Deng, Jiannan Lu, and Jonthan Litz. 2017. Trustworthy Analysis of Online A/B Tests: Pitfalls, Challenges and Solutions. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM ’17). Association for Computing Machinery, New York, NY , USA, 641–649. https://doi.org/10.1145/3018661.3018677

work page doi:10.1145/3018661.3018677 2017

[7] [7]

Jiaxin Deng, Shiyao Wang, Kuo Cai, Lejian Ren, Qigen Hu, Weifeng Ding, Qiang Luo, and Guorui Zhou

work page

[8] [8]

OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment

OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment. https://doi.org/10.48550/arXiv.2502.18965 arXiv:2502.18965 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.18965

[9] [9]

Mukund Deshpande and George Karypis. 2004. Item-Based Top-N Recommendation Algorithms. ACM Trans. Inf. Syst. 22, 1 (Jan. 2004), 143–177. https://doi.org/10.1145/963770.963776

work page doi:10.1145/963770.963776 2004

[10] [10]

Yingpeng Du, Zhu Sun, Ziyan Wang, Haoyan Chua, Jie Zhang, and Yew-Soon Ong. 2025. Active Large Language Model-Based Knowledge Distillation for Session-Based Recommendation. Proceedings of the AAAI Conference on Artificial Intelligence 39, 11 (Apr. 2025), 11607–11615. https://doi.org/10. 1609/aaai.v39i11.33263

work page 2025

[11] [11]

Yingpeng Du, Ziyan Wang, Zhu Sun, Yining Ma, Hongzhi Liu, and Jie Zhang. 2024. Disentangled Multi-interest Representation Learning for Sequential Recommendation. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24). Association for Computing Machinery, New York, NY , USA, 677–688. https://doi.org/10.1145/363752...

work page doi:10.1145/3637528.3671800 2024

[12] [12]

Yingpeng Du, Tianjun Wei, Zhu Sun, and Jie Zhang. 2025. Reinforcement Speculative Decoding for Fast Ranking. arXiv:2505.20316 [cs.AI] https://arxiv.org/abs/2505.20316

work page arXiv 2025

[13] [13]

Engel, R.D

J.F. Engel, R.D. Blackwell, and D.T. Kollat. 1978. Consumer Behavior. Dryden Press

work page 1978

[14] [14]

Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of The 33rd International Conference on Machine Learning. PMLR, 1050–1059

work page 2016

[15] [15]

Chen Gao, Xiaochong Lan, Zhihong Lu, Jinzhu Mao, Jinghua Piao, Huandong Wang, Depeng Jin, and Yong Li. 2023. S3: Social-network Simulation System with Large Language Model-Empowered Agents. https://doi.org/10.48550/arXiv.2307.14984 arXiv:2307.14984

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.14984 2023

[16] [16]

Chongming Gao, Shijun Li, Wenqiang Lei, Jiawei Chen, Biao Li, Peng Jiang, Xiangnan He, Jiaxin Mao, and Tat-Seng Chua. 2022. KuaiRec: A fully-observed dataset and insights for evaluating recommender systems. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management . 540–550

work page 2022

[17] [17]

Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dollé. 2018. Offline A/B Testing for Recommender Systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM ’18). Association for Computing Machinery, New York, NY , USA, 198–206. https://doi.org/10.1145/3159652.3159687

work page doi:10.1145/3159652.3159687 2018

[18] [18]

F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4 (2015), 1–19

work page 2015

[19] [19]

Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, YongDong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20). Association for Computing Machinery, New York, NY , USA, 639–648. h...

work page arXiv 2020

[20] [20]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173–182

work page 2017

[21] [21]

Bairu Hou, Yujian Liu, Kaizhi Qian, Jacob Andreas, Shiyu Chang, and Yang Zhang. 2024. Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling. In Proceedings of the 41st International Conference on Machine Learning (ICML’24, Vol. 235). JMLR.org, Vienna, Austria, 19023–19042

work page 2024

[22] [22]

Eugene Ie, Chih-wei Hsu, Martin Mladenov, Vihan Jain, Sanmit Narvekar, Jing Wang, Rui Wu, and Craig Boutilier. 2019. RecSim: A Configurable Simulation Platform for Recommender Systems. https: //doi.org/10.48550/arXiv.1909.04847 arXiv:1909.04847 [cs, stat]

work page doi:10.48550/arxiv.1909.04847 2019

[23] [23]

Yiqiao Jin, Qinlin Zhao, Yiyang Wang, Hao Chen, Kaijie Zhu, Yijia Xiao, and Jindong Wang. 2024. AgentReview: Exploring Peer Review Dynamics with LLM Agents. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, Miami, Fl...

work page doi:10.18653/v1/2024.emnlp-main.70 2024

[24] [24]

Daniel Kahneman. 2011. Thinking, Fast and Slow. Farrar, Straus and Giroux, New York, NY , US. 499 pages

work page 2011

[25] [25]

Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Recommendation. In 2018 IEEE International Conference on Data Mining (ICDM). 197–206. https://doi.org/10.1109/ICDM.2018. 00035

work page doi:10.1109/icdm.2018 2018

[26] [26]

Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D

Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James V . Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Chris Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wang, Pradeep Dasigi, and Hannaneh Hajishirz...

work page 2024

[27] [27]

Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural Attentive Session-based Recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (Singapore, Singapore) (CIKM ’17). Association for Computing Machinery, New York, NY , USA, 1419–1428. https://doi.org/10.1145/3132847.3132926 12

work page doi:10.1145/3132847.3132926 2017

[28] [28]

Ming Li, Yong Zhang, Shwai He, Zhitao Li, Hongyu Zhao, Jianzong Wang, Ning Cheng, and Tianyi Zhou

work page

[29] [29]

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.)

Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, Bangkok, Thailand, 14255–14273. https://doi.org/10.18653/v1/2024.acl-long.769

work page doi:10.18653/v1/2024.acl-long.769 2024

[30] [30]

Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. 2023. Towards General Text Embeddings with Multi-stage Contrastive Learning. https://doi.org/10.48550/arXiv. 2308.03281 arXiv:2308.03281 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2023

[31] [31]

Dawen Liang, Rahul G Krishnan, Matthew D Hoffman, and Tony Jebara. 2018. Variational autoencoders for collaborative filtering. In Proceedings of the 2018 world wide web conference. 689–698

work page 2018

[32] [32]

Xufang Luo, Zheng Liu, Shitao Xiao, Xing Xie, and Dongsheng Li. 2022. MINDSim: User Simulator for News Recommenders. In Proceedings of the ACM Web Conference 2022 (WWW ’22). Association for Computing Machinery, New York, NY , USA, 2067–2077. https://doi.org/10.1145/3485447. 3512080

work page doi:10.1145/3485447 2022

[33] [33]

Kelong Mao, Jieming Zhu, Jinpeng Wang, Quanyu Dai, Zhenhua Dong, Xi Xiao, and Xiuqiang He

work page

[34] [34]

In Proceedings of the 30th ACM international conference on information & knowledge management

SimpleX: A simple and strong baseline for collaborative filtering. In Proceedings of the 30th ACM international conference on information & knowledge management. 1243–1252

work page

[35] [35]

Muhammad Hasan Maqbool, Umar Farooq, Adib Mosharrof, AB Siddique, and Hassan Foroosh. 2023. MobileRec: A large scale dataset for mobile apps recommendation. InProceedings of the 46th international ACM SIGIR conference on research and development in information retrieval. 3007–3016

work page 2023

[36] [36]

Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP- IJCNLP). 188–197

work page 2019

[37] [37]

Mark O’Neill, Elham Vaziripour, Justin Wu, and Daniel Zappala. 2016. Condensing steam: Distilling the diversity of gamer behavior. In Proceedings of the 2016 internet measurement conference. 81–95

work page 2016

[38] [38]

Fernando Benjamin Perez Maurera, Maurizio Ferrari Dacrema, Pablo Castells, and Paolo Cremonesi

work page

[39] [39]

ACM Trans

Impression-Aware Recommender Systems. ACM Trans. Recomm. Syst. (Jan. 2025). https: //doi.org/10.1145/3712292

work page doi:10.1145/3712292 2025

[40] [40]

Qwen, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Ti...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.15115 2025

[41] [41]

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. 2024. Direct Preference Optimization: Your Language Model Is Secretly a Reward Model. https: //doi.org/10.48550/arXiv.2305.18290 arXiv:2305.18290 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.18290 2024

[42] [42]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of the Twenty-Fifth Conference on Uncer- tainty in Artificial Intelligence (UAI ’09). AUAI Press, Arlington, Virginia, USA, 452–461

work page 2009

[43] [43]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathemat- ical Reasoning in Open Language Models. https://doi.org/10.48550/arXiv.2402.03300 arXiv:2402.03300 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03300 2024

[44] [44]

Yifei Shen, Yongji Wu, Yao Zhang, Caihua Shan, Jun Zhang, B Khaled Letaief, and Dongsheng Li. 2021. How powerful is graph convolution for recommendation?. In Proceedings of the 30th ACM international conference on information & knowledge management. 1619–1629

work page 2021

[45] [45]

Elizaveta Stavinova, Alexander Grigorievskiy, Anna V olodkevich, Petr Chunaev, Klavdiya Bochenina, and Dmitry Bugaychenko. 2022. Synthetic Data-Based Simulators for Recommender Systems: A Survey. https://doi.org/10.48550/arXiv.2206.11338 arXiv:2206.11338 [cs] 13

work page doi:10.48550/arxiv.2206.11338 2022

[46] [46]

Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, and Zhaochun Ren. 2023. Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for C...

work page doi:10.18653/v1/2023.emnlp-main.923 2023

[47] [47]

Zhu Sun, Di Yu, Hui Fang, Jie Yang, Xinghua Qu, Jie Zhang, and Cong Geng. 2020. Are We Evaluating Rigorously? Benchmarking Recommendation for Reproducible Evaluation and Fair Comparison. In Proceedings of the 14th ACM Conference on Recommender Systems (Virtual Event, Brazil) (RecSys ’20). Association for Computing Machinery, New York, NY , USA, 23–32. htt...

work page arXiv 2020

[48] [48]

Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2408.00118 2024

[49] [49]

Mengting Wan and Julian McAuley. 2018. Item recommendation on monotonic behavior chains. In Proceedings of the 12th ACM Conference on Recommender Systems (Vancouver, British Columbia, Canada) (RecSys ’18). Association for Computing Machinery, New York, NY , USA, 86–94. https: //doi.org/10.1145/3240323.3240369

work page doi:10.1145/3240323.3240369 2018

[50] [50]

Lei Wang, Jingsen Zhang, Hao Yang, Zhi-Yuan Chen, Jiakai Tang, Zeyu Zhang, Xu Chen, Yankai Lin, Hao Sun, Ruihua Song, Xin Zhao, Jun Xu, Zhicheng Dou, Jun Wang, and Ji-Rong Wen. 2025. User Behavior Simulation with Large Language Model-based Agents. ACM Trans. Inf. Syst.43, 2 (Jan. 2025), 55:1–55:37. https://doi.org/10.1145/3708985

work page doi:10.1145/3708985 2025

[51] [51]

Wenjie Wang, Yiyan Xu, Fuli Feng, Xinyu Lin, Xiangnan He, and Tat-Seng Chua. 2023. Diffusion Recommender Model. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23). Association for Computing Machinery, New York, NY , USA, 832–841. https://doi.org/10.1145/3539618.3591663

work page doi:10.1145/3539618.3591663 2023

[52] [52]

Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval. 165–174

work page 2019

[53] [53]

Yancheng Wang, Ziyan Jiang, Zheng Chen, Fan Yang, Yingxue Zhou, Eunah Cho, Xing Fan, Yanbin Lu, Xiaojiang Huang, and Yingzhen Yang. 2024. RecMind: Large Language Model Powered Agent For 14 Recommendation. In Findings of the Association for Computational Linguistics: NAACL 2024 , Kevin Duh, Helena Gomez, and Steven Bethard (Eds.). Association for Computati...

work page doi:10.18653/v1/2024.findings-naacl.271 2024

[54] [54]

Yifan Wang, Weizhi Ma, Min Zhang, Yiqun Liu, and Shaoping Ma. 2023. A Survey on the Fairness of Recommender Systems. ACM Trans. Inf. Syst.41, 3 (Feb. 2023), 52:1–52:43. https://doi.org/10. 1145/3547333

work page 2023

[55] [55]

Zhenduo Wang, Zhichao Xu, Vivek Srikumar, and Qingyao Ai. 2024. An In-depth Investigation of User Response Simulation for Conversational Search. In Proceedings of the ACM Web Conference 2024 (WWW ’24). Association for Computing Machinery, New York, NY , USA, 1407–1418. https://doi.org/10. 1145/3589334.3645447

work page arXiv 2024

[56] [56]

Tianjun Wei, Tommy W. S. Chow, and Jianghong Ma. 2024. FPSR+: Toward Robust, Efficient, and Scalable Collaborative Filtering With Partition-Aware Item Similarity Modeling. IEEE Transactions on Knowledge and Data Engineering 36, 12 (Dec. 2024), 8283–8296. https://doi.org/10.1109/TKDE. 2024.3418080

work page doi:10.1109/tkde 2024

[57] [57]

Tianjun Wei, Tommy W. S. Chow, and Jianghong Ma. 2024. FPSR+: Toward Robust, Efficient, and Scalable Collaborative Filtering With Partition-Aware Item Similarity Modeling. IEEE Transactions on Knowledge and Data Engineering 36, 12 (2024), 8283–8296. https://doi.org/10.1109/TKDE.2024.3418080

work page doi:10.1109/tkde.2024.3418080 2024

[58] [58]

Tianjun Wei, Jianghong Ma, and Tommy W. S. Chow. 2023. Fine-tuning Partition-aware Item Similarities for Efficient and Scalable Recommendation. In Proceedings of the ACM Web Conference 2023(Austin, TX, USA) (WWW ’23). Association for Computing Machinery, New York, NY , USA, 823–832. https: //doi.org/10.1145/3543507.3583240

work page doi:10.1145/3543507.3583240 2023

[59] [59]

Wei Wei, Quoc Le, Andrew Dai, and Jia Li. 2018. AirDialogue: An Environment for Goal-Oriented Dialogue Research. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). Association for Computational Linguistics, Brussels, Belgium, 3844–3854. https...

work page doi:10.18653/v1/ 2018

[60] [60]

Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, and Ming Zhou. 2020. MIND: A Large-scale Dataset for News Recommendation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.)....

work page doi:10.18653/v1/2020.acl-main.331 2020

[61] [61]

Jiancan Wu, Xiang Wang, Fuli Feng, Xiangnan He, Liang Chen, Jianxun Lian, and Xing Xie. 2021. Self- Supervised Graph Learning for Recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’21). Association for Computing Machinery, New York, NY , USA, 726–735. https://doi.org/1...

work page doi:10.1145/3404835.3462862 2021

[62] [62]

Xiwang Yang, Harald Steck, Yang Guo, and Yong Liu. 2012. On Top-k Recommendation Using Social Net- works. In Proceedings of the Sixth ACM Conference on Recommender Systems (RecSys ’12). Association for Computing Machinery, New York, NY , USA, 67–74. https://doi.org/10.1145/2365952.2365969

work page doi:10.1145/2365952.2365969 2012

[63] [63]

An Zhang, Yuxin Chen, Leheng Sheng, Xiang Wang, and Tat-Seng Chua. 2024. On Generative Agents in Recommendation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). Association for Computing Machinery, New York, NY , USA, 1807–1817. https://doi.org/10.1145/3626772.3657844

work page doi:10.1145/3626772.3657844 2024

[64] [64]

Erhan Zhang, Xingzhu Wang, Peiyuan Gong, Yankai Lin, and Jiaxin Mao. 2024. USimAgent: Large Language Models for Simulating Search Users. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). Association for Computing Machinery, New York, NY , USA, 2687–2692. https://doi.org/10.1145/...

work page doi:10.1145/3626772.3657963 2024

[65] [65]

Junjie Zhang, Yupeng Hou, Ruobing Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu Lin, and Ji- Rong Wen. 2024. AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems. In Proceedings of the ACM Web Conference 2024 (WWW ’24) . Association for Computing Machinery, New York, NY , USA, 3679–3689. https://doi.org/10.1145/35...

work page doi:10.1145/3589334.3645537 2024

[66] [66]

Zijian Zhang, Shuchang Liu, Ziru Liu, Rui Zhong, Qingpeng Cai, Xiangyu Zhao, Chunxu Zhang, Qidong Liu, and Peng Jiang. 2025. LLM-Powered User Simulator for Recommender System. In Proceedings of the Thirty-Four International Joint Conference on Artificial Intelligence (AAAI ’25). https://doi.org/ 10.48550/arXiv.2412.16984 arXiv:2412.16984 [cs] 15

work page doi:10.48550/arxiv.2412.16984 2025

[67] [67]

Kesen Zhao, Shuchang Liu, Qingpeng Cai, Xiangyu Zhao, Ziru Liu, Dong Zheng, Peng Jiang, and Kun Gai. 2023. KuaiSim: A Comprehensive Simulator for Recommender Systems. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS ’23). Curran Associates Inc., Red Hook, NY , USA, 44880–44897

work page 2023

[68] [68]

Wayne Xin Zhao, Shanlei Mu, Yupeng Hou, Zihan Lin, Yushuo Chen, Xingyu Pan, Kaiyuan Li, Yujie Lu, Hui Wang, Changxin Tian, Yingqian Min, Zhichao Feng, Xinyan Fan, Xu Chen, Pengfei Wang, Wendi Ji, Yaliang Li, Xiaoling Wang, and Ji-Rong Wen. 2021. RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms. In CIKM. ACM, ...

work page 2021

[69] [69]

Wayne Xin Zhao, Shanlei Mu, Yupeng Hou, Zihan Lin, Yushuo Chen, Xingyu Pan, Kaiyuan Li, Yujie Lu, Hui Wang, Changxin Tian, Yingqian Min, Zhichao Feng, Xinyan Fan, Xu Chen, Pengfei Wang, Wendi Ji, Yaliang Li, Xiaoling Wang, and Ji-Rong Wen. 2021. RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms. In Proceedings...

work page doi:10.1145/3459637.3482016 2021

[70] [70]

Behavior: [G]

Unspecified settings follow the defaults of the torchtune and verl frameworks. We consider two training setups: • Single-Stage SFT: For models without the decision-making process, we apply supervised fine-tuning only, treating the task as single-token classification. • Two-Stage Fine-Tuning: For models incorporating decision-making, we perform a standard ...

work page 1998