User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation
Pith reviewed 2026-05-23 05:52 UTC · model grok-4.3
The pith
Realistic user simulators are indispensable catalysts for advancing artificial general intelligence by overcoming data and evaluation bottlenecks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
User simulation creates intelligent agents that mimic human user actions to model behavior, generate synthetic data for training, and evaluate interactive AI systems in a controlled and reproducible manner. In the generative AI era, the paper establishes the theoretical connection to AGI, arguing that realistic simulators are indispensable catalysts for overcoming critical data and evaluation bottlenecks and optimizing personalization, while demonstrating how controlled simulation can proactively ensure fair representation and system safety.
What carries the argument
The intelligent agent mimicking human user interactions with AI systems, which enables synthetic data generation and controlled evaluation as a bridge to AGI progress.
If this is right
- Synthetic data from user simulators can train AI models without depending on limited or private real-user interactions.
- Controlled evaluations using simulators allow testing of AI systems for safety and fairness prior to real-world deployment.
- Personalization in AI systems improves through iterative simulations of diverse user behaviors.
- A self-sustaining ecosystem between academia and industry accelerates development of user simulation technologies.
Where Pith is reading between the lines
- Progress in building realistic user simulators could function as a practical benchmark for measuring advancement toward AGI.
- Incorporating psychological models of human behavior might enhance the fidelity of simulators beyond current generative techniques.
- Widespread adoption could decrease reliance on large-scale human-subject experiments for system evaluation in HCI and related fields.
Load-bearing premise
That controlled simulation serves not merely as a risk vector for bias, but as a powerful, proactive tool to ensure fair representation and system safety.
What would settle it
An empirical demonstration that user simulators consistently introduce biases that cannot be mitigated or fail to improve personalization and evaluation outcomes would undermine the claim that they are indispensable for AGI advancement.
Figures
read the original abstract
User simulation is an emerging interdisciplinary topic with multiple critical applications in the era of Generative AI. It involves creating an intelligent agent that mimics the actions of a human user interacting with an AI system, enabling researchers to model and analyze user behaviour, generate synthetic data for training, and evaluate interactive AI systems in a controlled and reproducible manner. Because of its broad scope, research on this topic currently remains scattered across artificial intelligence, human-computer interaction, information science, computational social science, and psychology. To address this fragmented landscape of current research, this article presents a foundational synthesis. We highlight the paradigm shift from traditional predictive models to modern generative approaches, and explicitly frame critical ethical considerations -- demonstrating how controlled simulation serves not merely as a risk vector for bias, but as a powerful, proactive tool to ensure fair representation and system safety. Furthermore, we establish the theoretical connection between user simulation and the pursuit of Artificial General Intelligence, arguing that realistic simulators are indispensable catalysts for overcoming critical data and evaluation bottlenecks and optimizing personalization. Ultimately, we propose a practical, self-sustaining innovation ecosystem bridging academia and industry to advance this increasingly important technology.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper offers a foundational synthesis of user simulation research across AI, HCI, information science, computational social science, and psychology in the generative AI era. It covers creating intelligent agents to mimic human users for modeling behavior, generating synthetic data, and evaluating interactive systems. Key elements include highlighting a paradigm shift from traditional predictive models to modern generative approaches, framing ethical considerations (positioning controlled simulation as a proactive tool for fair representation and safety rather than solely a bias risk), establishing a theoretical connection to AGI (arguing realistic simulators are indispensable catalysts for data/evaluation bottlenecks and personalization), and proposing a practical self-sustaining innovation ecosystem bridging academia and industry.
Significance. If the synthesis accurately unifies the fragmented literature and the proposed ecosystem is viable, the work could provide a useful reference point for researchers and practitioners working on user modeling and evaluation in GenAI systems. The explicit treatment of ethics and the AGI framing, if substantiated, might help orient the field toward proactive uses of simulation; however, the significance is limited by the high-level nature of the claims and absence of new empirical or formal results.
major comments (2)
- [Abstract / AGI connection section] Abstract and the section establishing the AGI connection: The manuscript states that it 'establishes the theoretical connection' between user simulation and AGI pursuit, with simulators as 'indispensable catalysts' for overcoming data and evaluation bottlenecks. No definitions of the connection, formal mapping, reduction steps, or intermediate derivations are supplied; the claim rests on narrative assertion. This is load-bearing for the paper's central framing and positioning.
- [Abstract / ethics section] Abstract and ethics discussion: The claim that controlled simulation serves 'not merely as a risk vector for bias, but as a powerful, proactive tool to ensure fair representation and system safety' is presented without concrete mechanisms, case studies, or evidence showing how simulation achieves proactive fairness outcomes beyond risk mitigation. This framing is central to the ethical contribution.
minor comments (2)
- [Abstract / Introduction] The abstract and introduction use broad interdisciplinary scope claims without citing specific prior surveys or taxonomies that the synthesis builds upon or improves.
- [Introduction / User Modeling section] Terminology such as 'intelligent agent that mimics the actions of a human user' could be clarified with reference to existing agent architectures or simulation frameworks to avoid ambiguity.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our survey manuscript. We address each major comment below and will make revisions to strengthen the presentation of the AGI connection and ethical framing.
read point-by-point responses
-
Referee: [Abstract / AGI connection section] Abstract and the section establishing the AGI connection: The manuscript states that it 'establishes the theoretical connection' between user simulation and AGI pursuit, with simulators as 'indispensable catalysts' for overcoming data and evaluation bottlenecks. No definitions of the connection, formal mapping, reduction steps, or intermediate derivations are supplied; the claim rests on narrative assertion. This is load-bearing for the paper's central framing and positioning.
Authors: We agree that the connection is presented conceptually without formal mappings or derivations. As this is a survey synthesizing literature rather than a theoretical paper, the framing draws on existing arguments about data and evaluation challenges in AGI development. In the revised version, we will expand the relevant section with explicit definitions of key concepts, a structured conceptual mapping supported by additional citations, and concrete examples of simulator use in addressing bottlenecks, while clarifying that this constitutes a framing argument rather than a formal reduction. revision: yes
-
Referee: [Abstract / ethics section] Abstract and ethics discussion: The claim that controlled simulation serves 'not merely as a risk vector for bias, but as a powerful, proactive tool to ensure fair representation and system safety' is presented without concrete mechanisms, case studies, or evidence showing how simulation achieves proactive fairness outcomes beyond risk mitigation. This framing is central to the ethical contribution.
Authors: We acknowledge that the abstract and ethics discussion would be strengthened by more explicit mechanisms and examples. The full manuscript references literature on simulation for bias mitigation, but we agree additional detail is needed to substantiate the proactive framing. We will revise the abstract and expand the ethics section to include specific case studies (e.g., generating diverse synthetic user profiles for improved representation in training) and mechanisms drawn from cited works, demonstrating how simulation can proactively support fairness beyond risk reduction. revision: yes
Circularity Check
No circularity: conceptual synthesis with no derivations or load-bearing reductions
full rationale
The paper is an interdisciplinary overview and synthesis without equations, formal predictions, fitted parameters, or derivation chains. The claim to 'establish the theoretical connection' between user simulation and AGI is presented as high-level argumentation in the abstract and introduction, not as a reduction from prior steps or self-citations that collapse by construction. No instances match any of the enumerated circularity patterns (self-definitional, fitted input called prediction, self-citation load-bearing, etc.). This is the expected outcome for a non-mathematical survey paper whose central contribution is framing and synthesis rather than a derived result.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
Meituan Merchant Business Diagnosis via Policy-Guided Dual-Process User Simulation
PGHS fuses policy-guided LLM reasoning and ML fitting to simulate group user behavior with 8.8% error on Meituan data from 101 merchants and 26k trajectories, beating pure reasoning and fitting baselines by 45.8% and 40.9%.
Reference graph
Works this paper leans on
-
[1]
Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua R. Gubler, Christopher Rytting, and David Wingate. 2023. Out of One, Many: Using Language Models to Simulate Human Samples. Political Analysis 31, 3 (2023), 337–351
work page 2023
-
[2]
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, K...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[3]
Krisztian Balog, David Maxwell, Paul Thomas, and Shuo Zhang. 2022. Report on the 1st simulation for information retrieval workshop (Sim4IR 2021) at SIGIR
work page 2022
-
[4]
SIGIR Forum 55, 2, Article 10 (mar 2022), 16 pages
work page 2022
-
[5]
Krisztian Balog and ChengXiang Zhai. 2024. Tutorial on User Simulation for Evaluating Information Access Systems on the Web. In Companion Proceedings of the ACM on Web Conference 2024 (WWW ’24) . 1254–1257
work page 2024
-
[6]
Krisztian Balog and ChengXiang Zhai. 2024. User Simulation for Evaluating Information Access Systems. Foundations and Trends in Information Retrieval 18, 1-2 (2024), 1–261
work page 2024
-
[7]
Feza Baskaya, Heikki Keskustalo, and Kalervo Järvelin. 2012. Time Drives Inter- action: Simulating Sessions in Diverse Searching Environments. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’12). 105–114
work page 2012
-
[8]
Nolwenn Bernard and Krisztian Balog. 2024. Identifying Breakdowns in Conver- sational Recommender Systems using User Simulation. In Proceedings of the 6th ACM Conference on Conversational User Interfaces (CUI ’24). Article 26, 10 pages
work page 2024
-
[9]
Cheng-Han Chiang and Hung-yi Lee. 2023. Can Large Language Models Be an Alternative to Human Evaluations?. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL ’23) . 15607–15631
work page 2023
-
[10]
John Chung, Ece Kamar, and Saleema Amershi. 2023. Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL ’23). 575–593
work page 2023
-
[11]
Zhuyun Dai, Vincent Y Zhao, Ji Ma, Yi Luan, Jianmo Ni, Jing Lu, Anton Bakalov, Kelvin Guu, Keith Hall, and Ming-Wei Chang. 2023. Promptagator: Few-shot Dense Retrieval From 8 Examples. In The Eleventh International Conference on Learning Representations (ICLR ’23)
work page 2023
-
[12]
Kahneman Daniel. 2013. Thinking, fast and slow . New York : Farrar, Straus and Giroux. 8 User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation
work page 2013
-
[13]
Yang Deng, An Zhang, Yankai Lin, Xu Chen, Ji-Rong Wen, and Tat-Seng Chua
-
[14]
In Companion Pro- ceedings of the ACM Web Conference 2024 (WWW ’24)
Large Language Model Powered Agents in the Web. In Companion Pro- ceedings of the ACM Web Conference 2024 (WWW ’24) . 1242–1245
work page 2024
-
[15]
Wai-Tat Fu and Peter Pirolli. 2007. SNIF-ACT: A Cognitive Model of User Nav- igation on the World Wide Web. Human-Computer Interaction 22, 4 (2007), 355–412
work page 2007
-
[16]
Chen Gao, Xiaochong Lan, Nian Li, Yuan Yuan, Jingtao Ding, Zhilun Zhou, Fengli Xu, and Yong Li. 2024. Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives. Humanities and Social Sciences Communications 11, Article 1259 (2024)
work page 2024
-
[17]
Artur d’Avila Garcez and Luis C Lamb. 2023. Neurosymbolic AI: The 3rd wave. Artificial Intelligence Review 56, 11 (2023), 12387–12406
work page 2023
-
[18]
Karim Hamade, Reid McIlroy-Young, Siddhartha Sen, Jon Kleinberg, and Ashton Anderson. 2024. Designing Skill-Compatible AI: Methodologies and Frameworks in Chess. In The Twelfth International Conference on Learning Representations (ICLR ’24)
work page 2024
-
[19]
Naieme Hazrati and Francesco Ricci. 2024. Choice Models and Recommender Systems Effects on users’ Choices.User Modeling and User-Adapted Interaction 34 (2024), 109–145
work page 2024
-
[20]
Diane Kelly. 2009. Methods for Evaluating Interactive Information Retrieval Systems with Users. Foundations and Trends in Information Retrieval 3, 1–2 (2009), 1–224
work page 2009
-
[21]
Sungdong Kim, Minsuk Chang, and Sang-Woo Lee. 2021. NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-Based Simulation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (ACL ’21). 3704–3717
work page 2021
- [22]
-
[23]
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. 2024. Training language models to follow instructions with human fee...
work page 2024
-
[24]
Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Article 2
work page 2023
-
[25]
Behnam Rahdari, Peter Brusilovsky, and Branislav Kveton. 2024. Towards Simulation-Based Evaluation of Recommender Systems with Carousel Inter- faces. ACM Transactions on Recommender Systems 2, 1 (2024)
work page 2024
-
[26]
Philipp Schaer, Christin Katharina Kreutz, Krisztian Balog, Timo Breuer, and Norbert Fuhr. 2024. SIGIR 2024 Workshop on Simulations for Information Access (Sim4IA 2024). In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24) . 3058–3061
work page 2024
-
[27]
Skjæveland, Krisztian Balog, Nolwenn Bernard, Weronika Łajewska, and Trond Linjordet
Martin G. Skjæveland, Krisztian Balog, Nolwenn Bernard, Weronika Łajewska, and Trond Linjordet. 2024. An ecosystem for personal knowledge graphs: A survey and research roadmap. AI Open 5 (2024), 55–69
work page 2024
-
[28]
Heydar Soudani, Roxana Petcu, Evangelos Kanoulas, and Faegheh Hasibi
-
[29]
A Survey on Recent Advances in Conversational Data Generation. arXiv:2405.13003 [cs.CL]
-
[30]
Paul Thomas, Seth Spielman, Nick Craswell, and Bhaskar Mitra. 2024. Large Language Models can Accurately Predict Searcher Preferences. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). 1930–1940
work page 2024
-
[31]
Adelinde M. Uhrmacher and Danny Weyns. 2009. Multi-Agent Systems: Simula- tion and Applications. CRC Press, Inc., USA
work page 2009
-
[32]
Pertti Vakkari. 2016. Searching as Learning: A Systematization based on Litera- ture. Journal of Information Science 42, 1 (2016), 7–18
work page 2016
-
[33]
Ellen M Voorhees. 2000. Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness. Information processing & management 36, 5 (2000), 697–716
work page 2000
-
[34]
Baicun Wang, Huiying Zhou, Xingyu Li, Geng Yang, Pai Zheng, Ci Song, Yixiu Yuan, Thorsten Wuest, Huayong Yang, and Lihui Wang. 2024. Human Digital Twin in the context of Industry 5.0. Robotics and Computer-Integrated Manufac- turing 85, C (Feb. 2024)
work page 2024
-
[35]
Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Jirong Wen. 2024. A Survey on Large Language Model based Autonomous Agents. Frontiers of Computer Science 18 (2024)
work page 2024
-
[36]
Ryen W. White and Jeff Huang. 2010. Assessing the Scenic Route: Measuring the Value of Search Trails in Web Logs. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’10). 587–594
work page 2010
-
[37]
Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, and Haifeng Xu. 2023. How Bad is Top-𝐾 Recommendation under Competing Content Creators?. In Proceedings of the 40th International Conference on Machine Learning (ICML ’23) . 39674–39701
work page 2023
-
[38]
Yinan Zhang, Xueqing Liu, and ChengXiang Zhai. 2017. Information Retrieval Evaluation as Search Simulation: A General Formal Framework for IR Evaluation. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR ’17). 193–200. 9
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.