User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation

ChengXiang Zhai; Krisztian Balog

arxiv: 2501.04410 · v2 · submitted 2025-01-08 · 💻 cs.AI · cs.HC· cs.IR· cs.LG

User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation

Krisztian Balog , ChengXiang Zhai This is my paper

Pith reviewed 2026-05-23 05:52 UTC · model grok-4.3

classification 💻 cs.AI cs.HCcs.IRcs.LG

keywords user simulationgenerative AIsynthetic data generationsystem evaluationartificial general intelligenceethical AIpersonalization

0 comments

The pith

Realistic user simulators are indispensable catalysts for advancing artificial general intelligence by overcoming data and evaluation bottlenecks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper synthesizes scattered research on user simulation, which creates intelligent agents that mimic human users interacting with AI systems for behavior modeling, synthetic data generation, and controlled evaluation. It describes a paradigm shift from predictive to generative approaches and reframes ethical considerations to position simulation as a tool for fairness and safety rather than solely a source of bias. The central argument links user simulation to the pursuit of AGI, claiming realistic simulators are required to address data shortages, enable reproducible testing, and improve personalization. The work concludes by outlining a self-sustaining innovation ecosystem connecting academia and industry.

Core claim

User simulation creates intelligent agents that mimic human user actions to model behavior, generate synthetic data for training, and evaluate interactive AI systems in a controlled and reproducible manner. In the generative AI era, the paper establishes the theoretical connection to AGI, arguing that realistic simulators are indispensable catalysts for overcoming critical data and evaluation bottlenecks and optimizing personalization, while demonstrating how controlled simulation can proactively ensure fair representation and system safety.

What carries the argument

The intelligent agent mimicking human user interactions with AI systems, which enables synthetic data generation and controlled evaluation as a bridge to AGI progress.

If this is right

Synthetic data from user simulators can train AI models without depending on limited or private real-user interactions.
Controlled evaluations using simulators allow testing of AI systems for safety and fairness prior to real-world deployment.
Personalization in AI systems improves through iterative simulations of diverse user behaviors.
A self-sustaining ecosystem between academia and industry accelerates development of user simulation technologies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Progress in building realistic user simulators could function as a practical benchmark for measuring advancement toward AGI.
Incorporating psychological models of human behavior might enhance the fidelity of simulators beyond current generative techniques.
Widespread adoption could decrease reliance on large-scale human-subject experiments for system evaluation in HCI and related fields.

Load-bearing premise

That controlled simulation serves not merely as a risk vector for bias, but as a powerful, proactive tool to ensure fair representation and system safety.

What would settle it

An empirical demonstration that user simulators consistently introduce biases that cannot be mitigated or fail to improve personalization and evaluation outcomes would undermine the claim that they are indispensable for AGI advancement.

Figures

Figures reproduced from arXiv: 2501.04410 by ChengXiang Zhai, Krisztian Balog.

**Figure 2.** Figure 2: Illustration of evaluation methodologies and how user simulation complements them (adapted from [5]). [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: An innovation ecosystem, where academic researchers develop open-source user simulators, which industry partners [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

User simulation is an emerging interdisciplinary topic with multiple critical applications in the era of Generative AI. It involves creating an intelligent agent that mimics the actions of a human user interacting with an AI system, enabling researchers to model and analyze user behaviour, generate synthetic data for training, and evaluate interactive AI systems in a controlled and reproducible manner. Because of its broad scope, research on this topic currently remains scattered across artificial intelligence, human-computer interaction, information science, computational social science, and psychology. To address this fragmented landscape of current research, this article presents a foundational synthesis. We highlight the paradigm shift from traditional predictive models to modern generative approaches, and explicitly frame critical ethical considerations -- demonstrating how controlled simulation serves not merely as a risk vector for bias, but as a powerful, proactive tool to ensure fair representation and system safety. Furthermore, we establish the theoretical connection between user simulation and the pursuit of Artificial General Intelligence, arguing that realistic simulators are indispensable catalysts for overcoming critical data and evaluation bottlenecks and optimizing personalization. Ultimately, we propose a practical, self-sustaining innovation ecosystem bridging academia and industry to advance this increasingly important technology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A synthesis paper that organizes scattered user simulation work for GenAI but asserts rather than derives its AGI connection.

read the letter

This paper is a literature synthesis and position piece on user simulation for generative AI. It pulls together research from AI, HCI, information science, and psychology, notes the move from predictive to generative models, and proposes an academia-industry ecosystem for the area. That framing is the main contribution and it does organize a fragmented space in one place, which can save time for readers entering the topic. The ethics section reframes simulation as a tool for fairness and safety rather than just a bias risk, which is a reasonable angle to highlight. The AGI link is stated as a theoretical connection where simulators overcome data and evaluation bottlenecks, but the abstract gives no definitions, mappings, or steps to support that as a derived result instead of an interpretive claim. No new methods, data, or formal results appear. The paper stays at a high-level conceptual level throughout. Readers who want an overview of existing threads in user modeling and synthetic data will get some value from the organization. Those looking for technical advances, proofs, or testable predictions will not. It could merit peer review in a survey or position track because the topic is timely and the synthesis is coherent on its own terms, though the central AGI assertion would need more backing to hold up under review.

Referee Report

2 major / 2 minor

Summary. The paper offers a foundational synthesis of user simulation research across AI, HCI, information science, computational social science, and psychology in the generative AI era. It covers creating intelligent agents to mimic human users for modeling behavior, generating synthetic data, and evaluating interactive systems. Key elements include highlighting a paradigm shift from traditional predictive models to modern generative approaches, framing ethical considerations (positioning controlled simulation as a proactive tool for fair representation and safety rather than solely a bias risk), establishing a theoretical connection to AGI (arguing realistic simulators are indispensable catalysts for data/evaluation bottlenecks and personalization), and proposing a practical self-sustaining innovation ecosystem bridging academia and industry.

Significance. If the synthesis accurately unifies the fragmented literature and the proposed ecosystem is viable, the work could provide a useful reference point for researchers and practitioners working on user modeling and evaluation in GenAI systems. The explicit treatment of ethics and the AGI framing, if substantiated, might help orient the field toward proactive uses of simulation; however, the significance is limited by the high-level nature of the claims and absence of new empirical or formal results.

major comments (2)

[Abstract / AGI connection section] Abstract and the section establishing the AGI connection: The manuscript states that it 'establishes the theoretical connection' between user simulation and AGI pursuit, with simulators as 'indispensable catalysts' for overcoming data and evaluation bottlenecks. No definitions of the connection, formal mapping, reduction steps, or intermediate derivations are supplied; the claim rests on narrative assertion. This is load-bearing for the paper's central framing and positioning.
[Abstract / ethics section] Abstract and ethics discussion: The claim that controlled simulation serves 'not merely as a risk vector for bias, but as a powerful, proactive tool to ensure fair representation and system safety' is presented without concrete mechanisms, case studies, or evidence showing how simulation achieves proactive fairness outcomes beyond risk mitigation. This framing is central to the ethical contribution.

minor comments (2)

[Abstract / Introduction] The abstract and introduction use broad interdisciplinary scope claims without citing specific prior surveys or taxonomies that the synthesis builds upon or improves.
[Introduction / User Modeling section] Terminology such as 'intelligent agent that mimics the actions of a human user' could be clarified with reference to existing agent architectures or simulation frameworks to avoid ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our survey manuscript. We address each major comment below and will make revisions to strengthen the presentation of the AGI connection and ethical framing.

read point-by-point responses

Referee: [Abstract / AGI connection section] Abstract and the section establishing the AGI connection: The manuscript states that it 'establishes the theoretical connection' between user simulation and AGI pursuit, with simulators as 'indispensable catalysts' for overcoming data and evaluation bottlenecks. No definitions of the connection, formal mapping, reduction steps, or intermediate derivations are supplied; the claim rests on narrative assertion. This is load-bearing for the paper's central framing and positioning.

Authors: We agree that the connection is presented conceptually without formal mappings or derivations. As this is a survey synthesizing literature rather than a theoretical paper, the framing draws on existing arguments about data and evaluation challenges in AGI development. In the revised version, we will expand the relevant section with explicit definitions of key concepts, a structured conceptual mapping supported by additional citations, and concrete examples of simulator use in addressing bottlenecks, while clarifying that this constitutes a framing argument rather than a formal reduction. revision: yes
Referee: [Abstract / ethics section] Abstract and ethics discussion: The claim that controlled simulation serves 'not merely as a risk vector for bias, but as a powerful, proactive tool to ensure fair representation and system safety' is presented without concrete mechanisms, case studies, or evidence showing how simulation achieves proactive fairness outcomes beyond risk mitigation. This framing is central to the ethical contribution.

Authors: We acknowledge that the abstract and ethics discussion would be strengthened by more explicit mechanisms and examples. The full manuscript references literature on simulation for bias mitigation, but we agree additional detail is needed to substantiate the proactive framing. We will revise the abstract and expand the ethics section to include specific case studies (e.g., generating diverse synthetic user profiles for improved representation in training) and mechanisms drawn from cited works, demonstrating how simulation can proactively support fairness beyond risk reduction. revision: yes

Circularity Check

0 steps flagged

No circularity: conceptual synthesis with no derivations or load-bearing reductions

full rationale

The paper is an interdisciplinary overview and synthesis without equations, formal predictions, fitted parameters, or derivation chains. The claim to 'establish the theoretical connection' between user simulation and AGI is presented as high-level argumentation in the abstract and introduction, not as a reduction from prior steps or self-citations that collapse by construction. No instances match any of the enumerated circularity patterns (self-definitional, fitted input called prediction, self-citation load-bearing, etc.). This is the expected outcome for a non-mathematical survey paper whose central contribution is framing and synthesis rather than a derived result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a high-level survey paper; the abstract introduces no new free parameters, mathematical axioms, or invented entities.

pith-pipeline@v0.9.0 · 5740 in / 984 out tokens · 52228 ms · 2026-05-23T05:52:23.687860+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Meituan Merchant Business Diagnosis via Policy-Guided Dual-Process User Simulation
cs.AI 2026-04 unverdicted novelty 6.0

PGHS fuses policy-guided LLM reasoning and ML fitting to simulate group user behavior with 8.8% error on Meituan data from 101 merchants and 26k trajectories, beating pure reasoning and fitting baselines by 45.8% and 40.9%.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Argyle, Ethan C

Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua R. Gubler, Christopher Rytting, and David Wingate. 2023. Out of One, Many: Using Language Models to Simulate Human Samples. Political Analysis 31, 3 (2023), 337–351

work page 2023
[2]

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, K...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[3]

Krisztian Balog, David Maxwell, Paul Thomas, and Shuo Zhang. 2022. Report on the 1st simulation for information retrieval workshop (Sim4IR 2021) at SIGIR

work page 2022
[4]

SIGIR Forum 55, 2, Article 10 (mar 2022), 16 pages

work page 2022
[5]

Krisztian Balog and ChengXiang Zhai. 2024. Tutorial on User Simulation for Evaluating Information Access Systems on the Web. In Companion Proceedings of the ACM on Web Conference 2024 (WWW ’24) . 1254–1257

work page 2024
[6]

Krisztian Balog and ChengXiang Zhai. 2024. User Simulation for Evaluating Information Access Systems. Foundations and Trends in Information Retrieval 18, 1-2 (2024), 1–261

work page 2024
[7]

Feza Baskaya, Heikki Keskustalo, and Kalervo Järvelin. 2012. Time Drives Inter- action: Simulating Sessions in Diverse Searching Environments. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’12). 105–114

work page 2012
[8]

Nolwenn Bernard and Krisztian Balog. 2024. Identifying Breakdowns in Conver- sational Recommender Systems using User Simulation. In Proceedings of the 6th ACM Conference on Conversational User Interfaces (CUI ’24). Article 26, 10 pages

work page 2024
[9]

Cheng-Han Chiang and Hung-yi Lee. 2023. Can Large Language Models Be an Alternative to Human Evaluations?. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL ’23) . 15607–15631

work page 2023
[10]

John Chung, Ece Kamar, and Saleema Amershi. 2023. Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL ’23). 575–593

work page 2023
[11]

Zhuyun Dai, Vincent Y Zhao, Ji Ma, Yi Luan, Jianmo Ni, Jing Lu, Anton Bakalov, Kelvin Guu, Keith Hall, and Ming-Wei Chang. 2023. Promptagator: Few-shot Dense Retrieval From 8 Examples. In The Eleventh International Conference on Learning Representations (ICLR ’23)

work page 2023
[12]

Kahneman Daniel. 2013. Thinking, fast and slow . New York : Farrar, Straus and Giroux. 8 User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation

work page 2013
[13]

Yang Deng, An Zhang, Yankai Lin, Xu Chen, Ji-Rong Wen, and Tat-Seng Chua

work page
[14]

In Companion Pro- ceedings of the ACM Web Conference 2024 (WWW ’24)

Large Language Model Powered Agents in the Web. In Companion Pro- ceedings of the ACM Web Conference 2024 (WWW ’24) . 1242–1245

work page 2024
[15]

Wai-Tat Fu and Peter Pirolli. 2007. SNIF-ACT: A Cognitive Model of User Nav- igation on the World Wide Web. Human-Computer Interaction 22, 4 (2007), 355–412

work page 2007
[16]

Chen Gao, Xiaochong Lan, Nian Li, Yuan Yuan, Jingtao Ding, Zhilun Zhou, Fengli Xu, and Yong Li. 2024. Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives. Humanities and Social Sciences Communications 11, Article 1259 (2024)

work page 2024
[17]

Artur d’Avila Garcez and Luis C Lamb. 2023. Neurosymbolic AI: The 3rd wave. Artificial Intelligence Review 56, 11 (2023), 12387–12406

work page 2023
[18]

Karim Hamade, Reid McIlroy-Young, Siddhartha Sen, Jon Kleinberg, and Ashton Anderson. 2024. Designing Skill-Compatible AI: Methodologies and Frameworks in Chess. In The Twelfth International Conference on Learning Representations (ICLR ’24)

work page 2024
[19]

Naieme Hazrati and Francesco Ricci. 2024. Choice Models and Recommender Systems Effects on users’ Choices.User Modeling and User-Adapted Interaction 34 (2024), 109–145

work page 2024
[20]

Diane Kelly. 2009. Methods for Evaluating Interactive Information Retrieval Systems with Users. Foundations and Trends in Information Retrieval 3, 1–2 (2009), 1–224

work page 2009
[21]

Sungdong Kim, Minsuk Chang, and Sang-Woo Lee. 2021. NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-Based Simulation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (ACL ’21). 3704–3717

work page 2021
[22]

Kohavi, D

R. Kohavi, D. Tang, and Y. Xu. 2020. Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing . Cambridge University Press

work page 2020
[23]

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. 2024. Training language models to follow instructions with human fee...

work page 2024
[24]

Bernstein

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Article 2

work page 2023
[25]

Behnam Rahdari, Peter Brusilovsky, and Branislav Kveton. 2024. Towards Simulation-Based Evaluation of Recommender Systems with Carousel Inter- faces. ACM Transactions on Recommender Systems 2, 1 (2024)

work page 2024
[26]

Philipp Schaer, Christin Katharina Kreutz, Krisztian Balog, Timo Breuer, and Norbert Fuhr. 2024. SIGIR 2024 Workshop on Simulations for Information Access (Sim4IA 2024). In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24) . 3058–3061

work page 2024
[27]

Skjæveland, Krisztian Balog, Nolwenn Bernard, Weronika Łajewska, and Trond Linjordet

Martin G. Skjæveland, Krisztian Balog, Nolwenn Bernard, Weronika Łajewska, and Trond Linjordet. 2024. An ecosystem for personal knowledge graphs: A survey and research roadmap. AI Open 5 (2024), 55–69

work page 2024
[28]

Heydar Soudani, Roxana Petcu, Evangelos Kanoulas, and Faegheh Hasibi

work page
[29]

arXiv:2405.13003 [cs.CL]

A Survey on Recent Advances in Conversational Data Generation. arXiv:2405.13003 [cs.CL]

work page arXiv
[30]

Paul Thomas, Seth Spielman, Nick Craswell, and Bhaskar Mitra. 2024. Large Language Models can Accurately Predict Searcher Preferences. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). 1930–1940

work page 2024
[31]

Uhrmacher and Danny Weyns

Adelinde M. Uhrmacher and Danny Weyns. 2009. Multi-Agent Systems: Simula- tion and Applications. CRC Press, Inc., USA

work page 2009
[32]

Pertti Vakkari. 2016. Searching as Learning: A Systematization based on Litera- ture. Journal of Information Science 42, 1 (2016), 7–18

work page 2016
[33]

Ellen M Voorhees. 2000. Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness. Information processing & management 36, 5 (2000), 697–716

work page 2000
[34]

Baicun Wang, Huiying Zhou, Xingyu Li, Geng Yang, Pai Zheng, Ci Song, Yixiu Yuan, Thorsten Wuest, Huayong Yang, and Lihui Wang. 2024. Human Digital Twin in the context of Industry 5.0. Robotics and Computer-Integrated Manufac- turing 85, C (Feb. 2024)

work page 2024
[35]

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Jirong Wen. 2024. A Survey on Large Language Model based Autonomous Agents. Frontiers of Computer Science 18 (2024)

work page 2024
[36]

White and Jeff Huang

Ryen W. White and Jeff Huang. 2010. Assessing the Scenic Route: Measuring the Value of Search Trails in Web Logs. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’10). 587–594

work page 2010
[37]

Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, and Haifeng Xu. 2023. How Bad is Top-𝐾 Recommendation under Competing Content Creators?. In Proceedings of the 40th International Conference on Machine Learning (ICML ’23) . 39674–39701

work page 2023
[38]

Yinan Zhang, Xueqing Liu, and ChengXiang Zhai. 2017. Information Retrieval Evaluation as Search Simulation: A General Formal Framework for IR Evaluation. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR ’17). 193–200. 9

work page 2017

[1] [1]

Argyle, Ethan C

Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua R. Gubler, Christopher Rytting, and David Wingate. 2023. Out of One, Many: Using Language Models to Simulate Human Samples. Political Analysis 31, 3 (2023), 337–351

work page 2023

[2] [2]

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, K...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[3] [3]

Krisztian Balog, David Maxwell, Paul Thomas, and Shuo Zhang. 2022. Report on the 1st simulation for information retrieval workshop (Sim4IR 2021) at SIGIR

work page 2022

[4] [4]

SIGIR Forum 55, 2, Article 10 (mar 2022), 16 pages

work page 2022

[5] [5]

Krisztian Balog and ChengXiang Zhai. 2024. Tutorial on User Simulation for Evaluating Information Access Systems on the Web. In Companion Proceedings of the ACM on Web Conference 2024 (WWW ’24) . 1254–1257

work page 2024

[6] [6]

Krisztian Balog and ChengXiang Zhai. 2024. User Simulation for Evaluating Information Access Systems. Foundations and Trends in Information Retrieval 18, 1-2 (2024), 1–261

work page 2024

[7] [7]

Feza Baskaya, Heikki Keskustalo, and Kalervo Järvelin. 2012. Time Drives Inter- action: Simulating Sessions in Diverse Searching Environments. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’12). 105–114

work page 2012

[8] [8]

Nolwenn Bernard and Krisztian Balog. 2024. Identifying Breakdowns in Conver- sational Recommender Systems using User Simulation. In Proceedings of the 6th ACM Conference on Conversational User Interfaces (CUI ’24). Article 26, 10 pages

work page 2024

[9] [9]

Cheng-Han Chiang and Hung-yi Lee. 2023. Can Large Language Models Be an Alternative to Human Evaluations?. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL ’23) . 15607–15631

work page 2023

[10] [10]

John Chung, Ece Kamar, and Saleema Amershi. 2023. Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL ’23). 575–593

work page 2023

[11] [11]

Zhuyun Dai, Vincent Y Zhao, Ji Ma, Yi Luan, Jianmo Ni, Jing Lu, Anton Bakalov, Kelvin Guu, Keith Hall, and Ming-Wei Chang. 2023. Promptagator: Few-shot Dense Retrieval From 8 Examples. In The Eleventh International Conference on Learning Representations (ICLR ’23)

work page 2023

[12] [12]

Kahneman Daniel. 2013. Thinking, fast and slow . New York : Farrar, Straus and Giroux. 8 User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation

work page 2013

[13] [13]

Yang Deng, An Zhang, Yankai Lin, Xu Chen, Ji-Rong Wen, and Tat-Seng Chua

work page

[14] [14]

In Companion Pro- ceedings of the ACM Web Conference 2024 (WWW ’24)

Large Language Model Powered Agents in the Web. In Companion Pro- ceedings of the ACM Web Conference 2024 (WWW ’24) . 1242–1245

work page 2024

[15] [15]

Wai-Tat Fu and Peter Pirolli. 2007. SNIF-ACT: A Cognitive Model of User Nav- igation on the World Wide Web. Human-Computer Interaction 22, 4 (2007), 355–412

work page 2007

[16] [16]

Chen Gao, Xiaochong Lan, Nian Li, Yuan Yuan, Jingtao Ding, Zhilun Zhou, Fengli Xu, and Yong Li. 2024. Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives. Humanities and Social Sciences Communications 11, Article 1259 (2024)

work page 2024

[17] [17]

Artur d’Avila Garcez and Luis C Lamb. 2023. Neurosymbolic AI: The 3rd wave. Artificial Intelligence Review 56, 11 (2023), 12387–12406

work page 2023

[18] [18]

Karim Hamade, Reid McIlroy-Young, Siddhartha Sen, Jon Kleinberg, and Ashton Anderson. 2024. Designing Skill-Compatible AI: Methodologies and Frameworks in Chess. In The Twelfth International Conference on Learning Representations (ICLR ’24)

work page 2024

[19] [19]

Naieme Hazrati and Francesco Ricci. 2024. Choice Models and Recommender Systems Effects on users’ Choices.User Modeling and User-Adapted Interaction 34 (2024), 109–145

work page 2024

[20] [20]

Diane Kelly. 2009. Methods for Evaluating Interactive Information Retrieval Systems with Users. Foundations and Trends in Information Retrieval 3, 1–2 (2009), 1–224

work page 2009

[21] [21]

Sungdong Kim, Minsuk Chang, and Sang-Woo Lee. 2021. NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-Based Simulation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (ACL ’21). 3704–3717

work page 2021

[22] [22]

Kohavi, D

R. Kohavi, D. Tang, and Y. Xu. 2020. Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing . Cambridge University Press

work page 2020

[23] [23]

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. 2024. Training language models to follow instructions with human fee...

work page 2024

[24] [24]

Bernstein

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Article 2

work page 2023

[25] [25]

Behnam Rahdari, Peter Brusilovsky, and Branislav Kveton. 2024. Towards Simulation-Based Evaluation of Recommender Systems with Carousel Inter- faces. ACM Transactions on Recommender Systems 2, 1 (2024)

work page 2024

[26] [26]

Philipp Schaer, Christin Katharina Kreutz, Krisztian Balog, Timo Breuer, and Norbert Fuhr. 2024. SIGIR 2024 Workshop on Simulations for Information Access (Sim4IA 2024). In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24) . 3058–3061

work page 2024

[27] [27]

Skjæveland, Krisztian Balog, Nolwenn Bernard, Weronika Łajewska, and Trond Linjordet

Martin G. Skjæveland, Krisztian Balog, Nolwenn Bernard, Weronika Łajewska, and Trond Linjordet. 2024. An ecosystem for personal knowledge graphs: A survey and research roadmap. AI Open 5 (2024), 55–69

work page 2024

[28] [28]

Heydar Soudani, Roxana Petcu, Evangelos Kanoulas, and Faegheh Hasibi

work page

[29] [29]

arXiv:2405.13003 [cs.CL]

A Survey on Recent Advances in Conversational Data Generation. arXiv:2405.13003 [cs.CL]

work page arXiv

[30] [30]

Paul Thomas, Seth Spielman, Nick Craswell, and Bhaskar Mitra. 2024. Large Language Models can Accurately Predict Searcher Preferences. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). 1930–1940

work page 2024

[31] [31]

Uhrmacher and Danny Weyns

Adelinde M. Uhrmacher and Danny Weyns. 2009. Multi-Agent Systems: Simula- tion and Applications. CRC Press, Inc., USA

work page 2009

[32] [32]

Pertti Vakkari. 2016. Searching as Learning: A Systematization based on Litera- ture. Journal of Information Science 42, 1 (2016), 7–18

work page 2016

[33] [33]

Ellen M Voorhees. 2000. Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness. Information processing & management 36, 5 (2000), 697–716

work page 2000

[34] [34]

Baicun Wang, Huiying Zhou, Xingyu Li, Geng Yang, Pai Zheng, Ci Song, Yixiu Yuan, Thorsten Wuest, Huayong Yang, and Lihui Wang. 2024. Human Digital Twin in the context of Industry 5.0. Robotics and Computer-Integrated Manufac- turing 85, C (Feb. 2024)

work page 2024

[35] [35]

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Jirong Wen. 2024. A Survey on Large Language Model based Autonomous Agents. Frontiers of Computer Science 18 (2024)

work page 2024

[36] [36]

White and Jeff Huang

Ryen W. White and Jeff Huang. 2010. Assessing the Scenic Route: Measuring the Value of Search Trails in Web Logs. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’10). 587–594

work page 2010

[37] [37]

Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, and Haifeng Xu. 2023. How Bad is Top-𝐾 Recommendation under Competing Content Creators?. In Proceedings of the 40th International Conference on Machine Learning (ICML ’23) . 39674–39701

work page 2023

[38] [38]

Yinan Zhang, Xueqing Liu, and ChengXiang Zhai. 2017. Information Retrieval Evaluation as Search Simulation: A General Formal Framework for IR Evaluation. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR ’17). 193–200. 9

work page 2017