pith. sign in

arxiv: 2605.09321 · v2 · submitted 2026-05-10 · 💻 cs.IR

OpenIIR: An Open Simulation Platform for Information Retrieval Research

Pith reviewed 2026-05-15 05:50 UTC · model grok-4.3

classification 💻 cs.IR
keywords OpenIIRinformation retrieval simulationLLM personasmulti-agent systemsreproducible researchopen platformIR experiments
0
0 comments X

The pith

OpenIIR provides an open platform for running parameterised simulations of information retrieval using LLM-driven personas across multiple scenario types.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

OpenIIR is designed to let researchers conduct hundreds of reproducible experiments by running LLM-driven personas in multi-agent IR settings. It supports four main kinds of studies: deliberative panels, social platforms, curated recommender feeds, and evolutionary co-evolution between content producers and detectors. The platform offers a shared core for the runtime environment and a type interface that makes adding new scenarios straightforward with 200 to 400 lines of code. Four scenario types are released with reference runs, and six modular extensions are outlined for addressing open questions in IR research. This allows direct comparison of different configurations like retrieval policies and intervention timings through structured output data.

Core claim

The paper introduces OpenIIR as a platform that runs hundreds of LLM-driven personas as parameterised, reproducible IR research experiments. Researchers configure agents across four kinds of multi-agent study under many priors, rounds, and constraints. Every run produces structured outputs such as argument graphs, exposure logs, fitness traces, and transcripts. A new study requires only a 200-400 line plug-in over the shared core of agent runtime, world-model store, retrieval primitives, claim extractor, and persona ontology. The main contributions include this shared core, the type interface for pluggable scenarios, four released types with reference runs, and sketches of six modular extens

What carries the argument

The shared core with its agent runtime, world-model store, retrieval primitives, claim extractor, and persona ontology, along with the type interface for defining pluggable scenarios.

Load-bearing premise

The behaviors of LLM-driven personas closely mirror those of actual human users in information retrieval tasks.

What would settle it

A side-by-side comparison between simulation results and equivalent human participant experiments showing significant divergence in metrics such as exposure logs or argument formation.

Figures

Figures reproduced from arXiv: 2605.09321 by Saber Zerhoudi.

Figure 1
Figure 1. Figure 1: Three-layer architecture. The core is type-agnostic. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Panel argument graph: claim nodes connected by typed edges ( [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Per-panelist chat: converse with a single panelist [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Panel deliberation report: synthesis of positions, [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
read the original abstract

OpenIIR runs hundreds of LLM-driven personas as parameterised, reproducible IR research experiments. Researchers configure agents across four kinds of multi-agent study (deliberative panels, social platforms, curated recommender feeds, and evolutionary co-evolution between content producers and credibility detectors) under many priors, rounds, and constraints. Persona budgets, retrieval policies, ranker choices, intervention timings, and mutation rates are declared up front, and the same study can be re-run under different settings to compare outcomes side by side. Every run produces structured outputs (argument graphs, exposure logs, fitness traces, transcripts) that a downstream evaluator can consume directly, and a new study is a 200--400 line plug-in over a shared core (agent runtime, world-model store, retrieval primitives, claim extractor, persona ontology). The contributions are: (i) the shared core; (ii) a type interface for pluggable scenarios; (iii) four released types with reference runs (Panel, Social-Media, Curated-Feed, Multi-Generational); and (iv) six modular extensions sketched against open IR research questions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces OpenIIR, an open simulation platform for IR research that executes hundreds of LLM-driven personas in configurable, reproducible multi-agent experiments. It supplies a shared core (agent runtime, world-model store, retrieval primitives), a type interface for pluggable scenarios, four released scenario types (Panel, Social-Media, Curated-Feed, Multi-Generational) with reference runs that output argument graphs, exposure logs, fitness traces and transcripts, and six sketched modular extensions targeting open IR questions. All parameters (persona budgets, retrieval policies, mutation rates) are declared upfront to support side-by-side comparisons.

Significance. If the LLM personas can be shown to produce outputs that correlate with human IR behaviors, the platform would provide a reusable, open infrastructure for controlled experiments on social influence, recommendation dynamics, belief updating and content evolution, lowering the barrier to reproducible multi-agent IR studies and enabling direct comparison of interventions without large human-subject trials.

major comments (2)
  1. [Abstract] Abstract and contributions (iii): the central claim that the platform enables researchers to address open IR questions depends on LLM-driven personas serving as sufficiently accurate proxies for human behavior, yet the manuscript provides no empirical validation, calibration against human data (e.g., relevance judgments or social influence traces), or sensitivity analysis demonstrating external fidelity; reference runs are stated to show only internal reproducibility.
  2. [Contributions list] Contributions (iv): the six modular extensions are described only as sketches against open IR questions; without concrete interface definitions, example implementations, or sample outputs, it is impossible to assess whether they actually operationalize the claimed research questions.
minor comments (1)
  1. [Abstract] The 200-400 line plug-in claim is useful for estimating effort but would benefit from a brief pseudocode outline of the type interface or a minimal scenario skeleton to illustrate the extension mechanism.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address the major comments point by point below, clarifying the scope of the work and indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract and contributions (iii): the central claim that the platform enables researchers to address open IR questions depends on LLM-driven personas serving as sufficiently accurate proxies for human behavior, yet the manuscript provides no empirical validation, calibration against human data (e.g., relevance judgments or social influence traces), or sensitivity analysis demonstrating external fidelity; reference runs are stated to show only internal reproducibility.

    Authors: We agree that the platform's ability to address open IR questions ultimately depends on the degree to which LLM personas can serve as proxies for human behavior. The manuscript does not claim external validity or present the personas as calibrated models; it positions OpenIIR as shared infrastructure that makes controlled, reproducible multi-agent experiments feasible. The reference runs establish only internal reproducibility, as noted. To address the concern, we will revise the abstract and contributions section to explicitly state that the platform facilitates rather than validates such studies, and we will add a limitations section discussing the lack of human calibration data together with an outline of planned future empirical work. revision: partial

  2. Referee: [Contributions list] Contributions (iv): the six modular extensions are described only as sketches against open IR questions; without concrete interface definitions, example implementations, or sample outputs, it is impossible to assess whether they actually operationalize the claimed research questions.

    Authors: The six modular extensions are presented as conceptual illustrations of the platform's extensibility rather than as fully specified or implemented modules. We acknowledge that additional detail would allow better assessment of how they map to the research questions. In the revision we will expand the description of two extensions (social influence and content evolution) with concrete interface definitions, pseudocode, and example configuration and output formats, while retaining the others as higher-level sketches with clearer linkages to the targeted IR questions. revision: yes

Circularity Check

0 steps flagged

No circularity: platform is a self-contained software artifact

full rationale

The manuscript describes an open simulation platform consisting of a shared core, a type interface for pluggable scenarios, four released scenario types with reference runs, and sketched modular extensions. No derivation chain, equations, fitted parameters, predictions, or self-citations are present that reduce any claimed result to its own inputs by construction. The contributions are structural (code interfaces and reproducible runs) and can be evaluated independently via inspection and execution of the released artifacts, without any self-referential logic or load-bearing assumptions that collapse into the platform's own definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that LLM agents can simulate human IR behavior at useful fidelity; no free parameters or invented entities are introduced beyond standard configurable simulation settings.

axioms (1)
  • domain assumption LLM-driven agents can be configured to simulate diverse human personas in information retrieval tasks with sufficient fidelity for research insights
    The platform's value depends on this premise, which is invoked throughout the description of persona-based experiments but receives no validation data in the abstract.

pith-pipeline@v0.9.0 · 5488 in / 1311 out tokens · 74565 ms · 2026-05-15T05:50:58.549249+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 2 internal anchors

  1. [1]

    Leif Azzopardi, Timo Breuer, Björn Engelmann, Christin Kreutz, Sean MacA- vaney, David Maxwell, Andrew Parry, Adam Roegiest, Xi Wang, and Saber Zerhoudi. 2024. SimIIR 3: A framework for the simulation of interactive and con- versational information retrieval. InProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development i...

  2. [2]

    Leif Azzopardi, Charles LA Clarke, Claudia Hauff, Yubin Kim, Zhaochun Ren, Adam Roegiest, Johanne Trippas, and Saber Zerhoudi. 2026. The Third Search Futures Workshop at ECIR’26. InEuropean Conference on Information Retrieval. Springer, 177–183

  3. [3]

    Leif Azzopardi, Charles LA Clarke, Paul Kantor, Bhaskar Mitra, Johanne R Trip- pas, Zhaochun Ren, Mohammad Aliannejadi, Negar Arabzadeh, Raman Chan- drasekar, Maarten De Rijke, et al. 2024. Report on the search futures workshop at ECIR 2024. InACM SIGIR Forum, Vol. 58. ACM New York, NY, USA, 1–41

  4. [4]

    Eytan Bakshy, Solomon Messing, and Lada A Adamic. 2015. Exposure to ide- ologically diverse news and opinion on Facebook.Science348, 6239 (2015), 1130–1132

  5. [5]

    Krisztian Balog, Nolwenn Bernard, Saber Zerhoudi, and ChengXiang Zhai. 2025. Theory and Toolkits for User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 4138–4141. Saber Zerhoudi

  6. [6]

    Charles LA Clarke, Maria Maistro, Mark D Smucker, and Guido Zuccon. 2020. Overview of the TREC 2020 Health Misinformation Track.. InTREC

  7. [7]

    Charles L. A. Clarke, Paul Kantor, Adam Roegiest, Johanne R. Trippas, Zhaochun Ren, Maria Sofia Bucarelli, Xiao Fu, Yixing Fan, Michael Granitzer, David Graus, Maria Heuss, Jaap Kamps, Yibin Lei, Andrew Parry, Damiaan Reijnaers, Maarten de Rijke, Siddharth A. K. Singh, Yubao Tang, Suzan Verberne, Jonas Wallat, Yumeng Wang, Chen Xu, Andrew Yates, Saber Zer...

  8. [8]

    Guglielmo Faggioli, Laura Dietz, Charles LA Clarke, Gianluca Demartini, Matthias Hagen, Claudia Hauff, Noriko Kando, Evangelos Kanoulas, Martin Potthast, Benno Stein, et al. 2023. Perspectives on large language models for relevance judgment. InProceedings of the 2023 ACM SIGIR international conference on theory of information retrieval. 39–50

  9. [9]

    Michele Garetto, Alessandro Cornacchia, Franco Galante, Emilio Leonardi, Alessandro Nordio, and Alberto Tarable. 2025. Information Retrieval in the Age of Generative AI: The RGB Model. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 602–612

  10. [10]

    Zach Nussbaum, John X Morris, Brandon Duderstadt, and Andriy Mulyar. 2024. Nomic embed: Training a reproducible long context text embedder.arXiv preprint arXiv:2402.01613(2024)

  11. [11]

    Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user interface software and technology. 1–22

  12. [12]

    Andrew Parry, Maik Fröbe, Harrisen Scells, Ferdinand Schlatt, Guglielmo Fag- gioli, Saber Zerhoudi, Sean MacAvaney, and Eugene Yang. 2025. Variations in relevance judgments and the shelf life of test collections. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3387–3397

  13. [13]

    Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Nicolas Papernot, Ross Anderson, and Yarin Gal. 2024. AI models collapse when trained on recursively generated data.Nature631, 8022 (2024), 755–759

  14. [14]

    Alexander Sasha Vezhnevets, John P Agapiou, Avia Aharon, Ron Ziv, Jayd Matyas, Edgar A Duéñez-Guzmán, William A Cunningham, Simon Osindero, Danny Karmon, and Joel Z Leibo. 2023. Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia.arXiv preprint arXiv:2312.03664(2023)

  15. [15]

    Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online.science359, 6380 (2018), 1146–1151

  16. [16]

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)

  17. [17]

    Ziyi Yang, Zaibin Zhang, Zirui Zheng, Yuxian Jiang, Ziyue Gan, Zhiyu Wang, Zijian Ling, Jinsong Chen, Martz Ma, Bowen Dong, et al. [n. d.]. Oasis: Open agent social interaction simulations with one million agents, 2025.URL https://arxiv. org/abs/2411.11581([n. d.])

  18. [18]

    Saber Zerhoudi and Michael Granitzer. 2024. Cognitive-Aware User Search Behavior Simulation. InProceedings of the 24th ACM/IEEE Joint Conference on Digital Libraries. 1–12

  19. [19]

    Saber Zerhoudi and Michael Granitzer. 2024. Generative Agents Navigating Digital Libraries. InInternational Conference on Asian Digital Libraries. Springer, 171–188

  20. [20]

    Saber Zerhoudi and Michael Granitzer. 2024. PersonaRAG: Enhancing Retrieval- Augmented Generation Systems with User-Centric Agents. arXiv 2024.arXiv preprint arXiv:2407.09394

  21. [21]

    Saber Zerhoudi, Michael Granitzer, Dang Hai Dang, Jelena Mitrovic, Florian Lemmerich, Annette Hautli-Janisz, Stefan Katzenbeisser, and Kanishka Ghosh Dastidar. 2026. Behind the Prompt: The Agent-User Problem in Information Retrieval.arXiv preprint arXiv:2603.03630(2026)

  22. [22]

    Saber Zerhoudi, Sebastian Günther, Kim Plassmeier, Timo Borst, Christin Seifert, Matthias Hagen, and Michael Granitzer. 2022. The simiir 2.0 framework: User types, markov model-based interaction simulation, and advanced query genera- tion. InProceedings of the 31st ACM International Conference on Information & Knowledge Management. 4661–4666

  23. [23]

    Saber Zerhoudi, Adam Roegiest, and Johanne R Trippas. 2026. Simulation of Interactive Information Retrieval: A Guided Tour. InProceedings of the 2026 Conference on Human Information Interaction and Retrieval. 434–436

  24. [24]

    Erhan Zhang, Xingzhu Wang, Peiyuan Gong, Yankai Lin, and Jiaxin Mao. 2024. Usimagent: Large language models for simulating search users. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2687–2692

  25. [25]

    Xuhui Zhou, Hao Zhu, Leena Mathur, Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-Philippe Morency, Yonatan Bisk, Daniel Fried, Graham Neubig, et al

  26. [26]

    InInternational Conference on Learning Representations, Vol

    Sotopia: Interactive evaluation for social intelligence in language agents. InInternational Conference on Learning Representations, Vol. 2024. 40975–41019