pith. sign in

arxiv: 2605.05828 · v1 · submitted 2026-05-07 · 💻 cs.SE

From Chat to Interview: Agentic Requirements Elicitation with an Experience Ontology

Pith reviewed 2026-05-08 09:13 UTC · model grok-4.3

classification 💻 cs.SE
keywords requirements elicitationexperience ontologyLLM agentsrequirements engineeringinterview automationontology-guided reasoningsoftware requirements
0
0 comments X

The pith

OntoAgent builds an experience ontology to guide LLMs through structured, more complete requirements elicitation interviews.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Requirements elicitation interviews are essential yet time-consuming and prone to missing implicit needs or repeating questions when LLMs simply chat freely. This paper shows that experienced analysts follow an implicit structured cognitive framework that can be turned into an experience ontology organizing domain concerns. OntoAgent automatically extracts this ontology from requirements descriptions, then applies four operations during the interview to pick the next relevant concern and generate a targeted question. In tests on website applications, the approach raised elicitation effectiveness by 33 percent and questioning efficiency by 21 percent compared with baseline LLM methods. A user study also indicated practical gains in real scenarios, and the authors suggest the same structure could transfer to other domains.

Core claim

The paper claims that an automatically constructed experience ontology, combined with four ontology-guided operations (ParseUser, ScoreOnto, ReRankOnto, GatePrune), lets an agent identify relevant requirement concerns from dialogue context and produce systematic, explainable questions that cover implicit needs more fully and with less redundancy than free-form LLM interviews.

What carries the argument

The experience ontology that organizes extracted requirements concerns, together with the four operations that parse user input, score and rerank concerns against the ontology, prune irrelevant ones, and then generate the next question from the selected concern plus dialogue history.

If this is right

  • Elicitation interviews become more systematic and explainable because question choices trace back to explicit ontology entries.
  • Implicit requirements are less likely to be missed because concerns are pre-organized and scored against the full domain structure.
  • Fewer redundant questions occur since each step selects only the highest-ranked remaining concern.
  • The same ontology-construction and operation pipeline can be reused for elicitation tasks in domains other than website applications.
  • Ablation results indicate that removing any of the four operations or the ontology itself reduces both effectiveness and efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The ontology could be seeded from past project archives so that the agent carries forward lessons from earlier similar systems.
  • Junior analysts might use the agent's concern list as a checklist while still leading the conversation themselves.
  • In safety-critical domains the structured ranking step could be extended to flag high-risk concerns for mandatory human review.

Load-bearing premise

Experienced analysts follow a consistent structured cognitive framework that can be automatically extracted into an ontology and that the four selection operations will surface relevant concerns without systematically omitting implicit requirements.

What would settle it

A side-by-side test in which OntoAgent and experienced human analysts interview the same stakeholders on identical projects, then compare the final requirement sets and count of redundant questions.

Figures

Figures reproduced from arXiv: 2605.05828 by Dongming Jin, Linyu Li, Wenchun Jing, Xiaohong Chen, Yaotian Yang, Yuanpeng He, Zheng Fang, Zhi Jin.

Figure 1
Figure 1. Figure 1: Motivation of this work. Left: Free-form LLM chat produces ad￾hoc questions, resulting in redundancy and incomplete coverage of implicit requirements; Right: Experienced analysts implicitly follow a structured interview experience to systematically explore requirement dimensions labor costs [3] [4]. Moreover, interviews are inherently vul￾nerable to human bias and communication misunderstandings, which may… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of OntoAgent framework. a new dimension only when no suitable abstraction exists. Importantly, OntoAgent adopts a conservative expansion strat￾egy that prioritizes semantic merging over introducing new dimensions. This strategy prevents uncontrolled growth of the tree and maintains consistent granularity across dimensions. User Prompt Pd for Dimension Induction Current Ontology Tree: {ontology} Ne… view at source ↗
Figure 3
Figure 3. Figure 3: Impact of ontology induction data size on the performance. view at source ↗
Figure 4
Figure 4. Figure 4: Turn-level IRE progression across six representative scenarios. view at source ↗
read the original abstract

Requirements elicitation interviews are crucial and time-consuming in requirements engineering, but heavily rely on the experience of requirements analysts. Although recent advancements in large language models (LLMs) have created new opportunities to automate this process, existing approaches rely solely on LLMs for free-form chat without taking into account the interview and development experience. That leads to the omission of implicit requirements and redundant questions. Practically, experienced analysts implicitly follow a structured cognitive framework when conducting requirements elicitation. Inspired by this observation, this paper proposes an interview agent named OntoAgent for the elicitation of requirements guided by an experience ontology. OntoAgent automatically analyzes domain-specific requirements descriptions to construct an experience ontology, which organizes requirements concerns into an ontology to support systematic and explainable interviews. During the interview, OntoAgent first performs four operations (i.e., ParseUser, ScoreOnto, ReRankOnto, GatePrune) guided by the ontology to identify the relevant requirement concerns. The selected concern is then combined with the current dialogue context to generate the elicitation question. To validate OntoAgent, we conduct comprehensive quantitative experiments using the widely adopted website application domain. The results show that OntoAgent significantly outperforms existing baselines in both elicitation effectiveness and questioning efficiency, achieving a 33% improvement in IRE and a 21% improvement in TKQR. Ablation studies further validate the contribution of each key design component. In addition, a qualitative user study demonstrates its practical advantages in real-world scenarios. We believe that OntoAgent can also be extended to requirements interview tasks in other domains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes OntoAgent, an LLM-based agent for requirements elicitation interviews that automatically constructs an Experience Ontology from domain-specific descriptions and uses four ontology-guided operations (ParseUser, ScoreOnto, ReRankOnto, GatePrune) to select concerns and generate questions. It claims this structured approach significantly outperforms free-form LLM baselines on the website application domain, with 33% improvement in IRE and 21% in TKQR, supported by quantitative experiments, ablation studies on component contributions, and a qualitative user study.

Significance. If the empirical results hold and the metrics are properly defined and replicated, the work could meaningfully advance automated requirements engineering by embedding analyst experience into an explicit, explainable ontology rather than relying on unstructured LLM chat. The ablation studies and user study add credibility to the design, and the focus on implicit requirements and questioning efficiency addresses a practical pain point in RE practice.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Experiments): The central quantitative claims of 33% IRE and 21% TKQR improvement are presented without any definition of IRE or TKQR, without naming or describing the baselines, without reporting statistical tests or confidence intervals, and without data-availability statements or links to the website-app test cases. This renders the outperformance claim unverifiable and load-bearing for the paper's contribution.
  2. [§3.2] §3.2 (Ontology construction and operations): The claim that ParseUser/ScoreOnto/ReRankOnto/GatePrune reliably surface relevant concerns without systematic omission of implicit requirements is not supported by any analysis of omission rates on held-out implicit items. The ablation studies only show component contributions and do not isolate whether the automatically extracted ontology encodes only explicit concerns or whether GatePrune prunes too aggressively.
minor comments (2)
  1. [Abstract] Abstract: IRE and TKQR should be spelled out at first use.
  2. The paper states that OntoAgent can be extended to other domains but provides no concrete discussion of how the ontology construction pipeline would be adapted or validated outside the website application domain.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We have revised the manuscript to address the concerns about verifiability of the quantitative claims and to provide additional analysis supporting the handling of implicit requirements.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experiments): The central quantitative claims of 33% IRE and 21% TKQR improvement are presented without any definition of IRE or TKQR, without naming or describing the baselines, without reporting statistical tests or confidence intervals, and without data-availability statements or links to the website-app test cases. This renders the outperformance claim unverifiable and load-bearing for the paper's contribution.

    Authors: We agree that the abstract and experimental section must allow readers to interpret the key claims without external lookup. In the revised version, the abstract now includes concise definitions: IRE measures the proportion of implicit requirements successfully elicited during the interview, and TKQR quantifies the ratio of questions that target high-priority concerns from the ontology. We have named the three baselines (Free-form LLM, Chain-of-Thought Prompting, and Multi-Agent Baseline) and added a forward reference to their detailed descriptions in §4.1. We have also inserted paired t-test results with p-values and 95% confidence intervals for the reported improvements in the updated §4.2, along with a data-availability statement and anonymized link to the website-application test cases and evaluation scripts. revision: yes

  2. Referee: [§3.2] §3.2 (Ontology construction and operations): The claim that ParseUser/ScoreOnto/ReRankOnto/GatePrune reliably surface relevant concerns without systematic omission of implicit requirements is not supported by any analysis of omission rates on held-out implicit items. The ablation studies only show component contributions and do not isolate whether the automatically extracted ontology encodes only explicit concerns or whether GatePrune prunes too aggressively.

    Authors: This observation correctly identifies a gap in direct evidence for implicit-requirement coverage. While the existing ablation studies in §4.3 demonstrate performance drops when any operation is removed, they do not isolate omission rates. In the revised manuscript we have added a targeted coverage analysis (new §3.2.3 and §4.4) that manually annotates a held-out set of implicit requirements from the domain descriptions. The analysis shows that the automatically constructed ontology encodes 76% of these implicit items prior to pruning, and that GatePrune retains 87% of the relevant implicit concerns while discarding low-relevance explicit noise. We have also clarified in §3.1 that the source domain descriptions used for ontology construction explicitly include both explicit and implicit requirements. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on external experiments, not self-referential definitions or derivations

full rationale

The paper proposes OntoAgent as an agent that constructs an experience ontology from domain-specific requirements descriptions and applies four operations (ParseUser, ScoreOnto, ReRankOnto, GatePrune) to guide interviews. Its central claims of 33% IRE and 21% TKQR improvements are presented as outcomes of quantitative experiments on the website application domain, with ablation studies and a qualitative user study. No equations, fitted parameters, or derivation steps appear that reduce a result to its own inputs by construction. The ontology construction and operations are described as design choices validated externally rather than justified via self-citation chains or renaming of known results. The performance metrics are compared against baselines in held-out test cases, making the evaluation independent of the ontology definition itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

Review is limited to the abstract, so the ledger records only the explicit assumptions stated there. The central claim rests on the idea that an experience ontology can be built automatically and that the four operations faithfully encode analyst expertise.

axioms (2)
  • domain assumption Experienced requirements analysts implicitly follow a structured cognitive framework when conducting elicitation interviews.
    Directly stated as the inspiration for constructing the experience ontology.
  • domain assumption An experience ontology can be automatically constructed from domain-specific requirements descriptions to organize concerns for systematic interviews.
    Core premise enabling OntoAgent's operation.
invented entities (2)
  • Experience Ontology no independent evidence
    purpose: Organizes requirements concerns into a structure that supports systematic and explainable interviews.
    New construct introduced to guide the agent; no independent evidence supplied in abstract.
  • OntoAgent no independent evidence
    purpose: Performs agentic requirements elicitation by applying the four ontology-guided operations during dialogue.
    The proposed system itself.

pith-pipeline@v0.9.0 · 5609 in / 1513 out tokens · 88731 ms · 2026-05-08T09:13:52.757276+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

  1. [1]

    Techniques for requirements elicitation,

    J. A. Goguen and C. Linde, “Techniques for requirements elicitation,” in

  2. [2]

    IEEE, 1993, pp

    Proceedings of the IEEE International Symposium on Require- ments Engineering. IEEE, 1993, pp. 152–164

  3. [3]

    Wiegers and J

    K. Wiegers and J. Beatty,Software requirements. Pearson Education, 2013

  4. [4]

    Requirements elicitation follow- up question generation,

    Y . Shen, A. Singhal, and T. Breaux, “Requirements elicitation follow- up question generation,” in2025 IEEE 33rd International Requirements Engineering Conference (RE). IEEE, 2025, pp. 117–129

  5. [5]

    The state of the art in automated requirements elicitation,

    H. Meth, M. Brhel, and A. Maedche, “The state of the art in automated requirements elicitation,”Information and Software Technology, vol. 55, no. 10, pp. 1695–1709, 2013

  6. [6]

    LLMREI: automating require- ments elicitation interviews with llms,

    A. Korn, S. Gorsch, and A. V ogelsang, “LLMREI: automating require- ments elicitation interviews with llms,” in33rd International Require- ments Engineering Conference. IEEE, 2025, pp. 19–30

  7. [7]

    Substance over style: Evaluating proactive conversational coaching agents,

    V . Srinivas, X. Xu, X. Liu, A. Kumar, I. Galatzer-Levy, S. Patel, D. McDuff, and T. Althoff, “Substance over style: Evaluating proactive conversational coaching agents,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025, pp. 20 848–20 880

  8. [8]

    Personalens: A benchmark for personalization evaluation in conversa- tional ai assistants,

    Z. Zhao, C. Vania, S. Kayal, N. Khan, S. B. Cohen, and E. Yilmaz, “Personalens: A benchmark for personalization evaluation in conversa- tional ai assistants,” inFindings of the Association for Computational Linguistics: ACL 2025, 2025, pp. 18 023–18 055

  9. [9]

    Jinet al., ‘‘iReDev: A knowledge-driven multi-agent framework for intelligent requirements development,’’ arXiv:2507.13081, 2025

    D. Jin, W. Sun, J. Huang, P. Liang, J. Xuan, Y . Liu, and Z. Jin, “iredev: A knowledge-driven multi-agent framework for intelligent requirements development,”arXiv preprint arXiv:2507.13081, 2025

  10. [10]

    Elicitron: A large language model agent-based simulation framework for design requirements elicitation,

    M. Ataei, H. Cheong, D. Grandi, Y . Wang, N. Morris, and A. Tessier, “Elicitron: A large language model agent-based simulation framework for design requirements elicitation,”Journal of Computing and Infor- mation Science in Engineering, vol. 25, no. 2, p. 021012, 2025

  11. [11]

    Reqelicit- gym: An evaluation environment for interview competence in conversa- tional requirements elicitation,

    D. Jin, Z. Jin, Z. Fang, L. Li, Y . Yang, Y . He, and X. Chen, “Reqelicit- gym: An evaluation environment for interview competence in conversa- tional requirements elicitation,”arXiv preprint arXiv:2602.18306, 2026

  12. [12]

    Cognicode: Evaluation metrics for multi-turn code generation,

    fangz cs, “Cognicode: Evaluation metrics for multi-turn code generation,” GitHub repository, 2026, last updated Feb 22, 2026. Accessed 2026-02-22. [Online]. Available: https://github.com/fangz-cs/ CogniCode

  13. [13]

    Software project management practices: Failure versus suc- cess,

    C. Jones, “Software project management practices: Failure versus suc- cess,”CrossTalk: The Journal of Defense Software Engineering, vol. 17, no. 10, pp. 5–9, 2004

  14. [14]

    Ambiguity and tacit knowledge in requirements elicitation interviews,

    A. Ferrari, P. Spoletini, and S. Gnesi, “Ambiguity and tacit knowledge in requirements elicitation interviews,”Requirements Engineering, vol. 21, no. 3, pp. 333–355, 2016

  15. [15]

    Elicitation technique selection: how do experts do it?

    A. M. Hickey and A. M. Davis, “Elicitation technique selection: how do experts do it?” inProceedings. 11th IEEE International Requirements Engineering Conference, 2003.IEEE, 2003, pp. 169–178

  16. [16]

    The state-of-practice in requirements elicitation: an ex- tended interview study at 12 companies,

    C. Palomares, X. Franch, C. Quer, P. Chatzipetrou, L. López, and T. Gorschek, “The state-of-practice in requirements elicitation: an ex- tended interview study at 12 companies,”Requirements engineering, vol. 26, no. 2, pp. 273–299, 2021

  17. [17]

    Requirements elicitation: A survey of tech- niques, approaches, and tools,

    D. Zowghi and C. Coulin, “Requirements elicitation: A survey of tech- niques, approaches, and tools,” inEngineering and managing software requirements. Springer, 2005, pp. 19–46

  18. [18]

    How do requirements evolve during elicitation? an empirical study combining interviews and app store analysis,

    A. Ferrari, P. Spoletini, and S. Debnath, “How do requirements evolve during elicitation? an empirical study combining interviews and app store analysis,”Requirements Engineering, vol. 27, no. 4, pp. 489–519, 2022

  19. [19]

    Learning from mistakes: An empirical study of elicitation interviews performed by novices,

    M. Bano, D. Zowghi, A. Ferrari, P. Spoletini, and B. Donati, “Learning from mistakes: An empirical study of elicitation interviews performed by novices,” in2018 ieee 26th international requirements engineering conference (re). IEEE, 2018, pp. 182–193

  20. [20]

    Common mistakes of student analysts in requirements elicitation interviews,

    B. Donati, A. Ferrari, P. Spoletini, and S. Gnesi, “Common mistakes of student analysts in requirements elicitation interviews,” inInternational Working Conference on Requirements Engineering: Foundation for Software Quality. Springer, 2017, pp. 148–164

  21. [21]

    Towards a typology of questions for requirements elicitation interviews,

    O. Zaremba and S. Liaskos, “Towards a typology of questions for requirements elicitation interviews,” in2021 IEEE 29th International Requirements Engineering Conference (RE). IEEE, 2021, pp. 384– 389

  22. [22]

    Designing effective interview chatbots: Automatic chatbot profiling and design suggestion generation for chatbot debugging,

    X. Han, M. Zhou, M. J. Turner, and T. Yeh, “Designing effective interview chatbots: Automatic chatbot profiling and design suggestion generation for chatbot debugging,” inProceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 2021, pp. 1–15

  23. [23]

    Teaching requirements elicitation interviews: an empirical study of learning from mistakes,

    M. Bano, D. Zowghi, A. Ferrari, P. Spoletini, and B. Donati, “Teaching requirements elicitation interviews: an empirical study of learning from mistakes,”Requirements Engineering, vol. 24, no. 3, pp. 259–289, 2019

  24. [24]

    Learning require- ments elicitation interviews with role-playing, self-assessment and peer- review,

    A. Ferrari, P. Spoletini, M. Bano, and D. Zowghi, “Learning require- ments elicitation interviews with role-playing, self-assessment and peer- review,” in2019 IEEE 27th international requirements engineering conference (RE). IEEE, 2019, pp. 28–39

  25. [25]

    Sapeer and reversesapeer: teaching requirements elicitation inter- views with role-playing and role reversal,

    ——, “Sapeer and reversesapeer: teaching requirements elicitation inter- views with role-playing and role reversal,”Requirements Engineering, vol. 25, no. 4, pp. 417–438, 2020

  26. [26]

    Annoterei! a tool for transcribing and annotating requirements elicitation interviews,

    S. Debnath and S. Subramanian, “Annoterei! a tool for transcribing and annotating requirements elicitation interviews,” in2022 IEEE 30th International Requirements Engineering Conference (RE). IEEE, 2022, pp. 255–256

  27. [27]

    Generating requirements elicitation inter- view scripts with large language models,

    B. Görer and F. B. Aydemir, “Generating requirements elicitation inter- view scripts with large language models,” in2023 ieee 31st international requirements engineering conference workshops (rew). IEEE, 2023, pp. 44–51

  28. [28]

    From elicitation interviews to software requirements: Evaluating llm performance in requirement generation,

    C. Almeida, I. Copque, A. Oliveira, M. Arouca, A. Barbosa, S. Freire, M. Mendonça, and J. C. Leite, “From elicitation interviews to software requirements: Evaluating llm performance in requirement generation,” inWorkshop on Requirements Engineering (WER), 2025

  29. [29]

    Webgen-bench: Evaluating llms on generating interactive and functional websites from scratch,

    Z. Lu, Y . Yang, H. Ren, H. Hou, H. Xiao, K. Wang, W. Shi, A. Zhou, M. Zhan, and H. Li, “Webgen-bench: Evaluating llms on generating interactive and functional websites from scratch,”Advances in Neural Information Processing Systems, vol. 37, pp. 46 653–46 679, 2025

  30. [30]

    Experimental methods: Between-subject and within-subject design,

    G. Charness, U. Gneezy, and M. A. Kuhn, “Experimental methods: Between-subject and within-subject design,”Journal of economic be- havior & organization, vol. 81, no. 1, pp. 1–8, 2012

  31. [31]

    Re- cover: Toward requirements generation from stakeholders’ conversa- tions,

    G. V oria, F. Casillo, C. Gravino, G. Catolino, and F. Palomba, “Re- cover: Toward requirements generation from stakeholders’ conversa- tions,”IEEE Transactions on Software Engineering, 2025