From Chat to Interview: Agentic Requirements Elicitation with an Experience Ontology
Pith reviewed 2026-05-08 09:13 UTC · model grok-4.3
The pith
OntoAgent builds an experience ontology to guide LLMs through structured, more complete requirements elicitation interviews.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that an automatically constructed experience ontology, combined with four ontology-guided operations (ParseUser, ScoreOnto, ReRankOnto, GatePrune), lets an agent identify relevant requirement concerns from dialogue context and produce systematic, explainable questions that cover implicit needs more fully and with less redundancy than free-form LLM interviews.
What carries the argument
The experience ontology that organizes extracted requirements concerns, together with the four operations that parse user input, score and rerank concerns against the ontology, prune irrelevant ones, and then generate the next question from the selected concern plus dialogue history.
If this is right
- Elicitation interviews become more systematic and explainable because question choices trace back to explicit ontology entries.
- Implicit requirements are less likely to be missed because concerns are pre-organized and scored against the full domain structure.
- Fewer redundant questions occur since each step selects only the highest-ranked remaining concern.
- The same ontology-construction and operation pipeline can be reused for elicitation tasks in domains other than website applications.
- Ablation results indicate that removing any of the four operations or the ontology itself reduces both effectiveness and efficiency.
Where Pith is reading between the lines
- The ontology could be seeded from past project archives so that the agent carries forward lessons from earlier similar systems.
- Junior analysts might use the agent's concern list as a checklist while still leading the conversation themselves.
- In safety-critical domains the structured ranking step could be extended to flag high-risk concerns for mandatory human review.
Load-bearing premise
Experienced analysts follow a consistent structured cognitive framework that can be automatically extracted into an ontology and that the four selection operations will surface relevant concerns without systematically omitting implicit requirements.
What would settle it
A side-by-side test in which OntoAgent and experienced human analysts interview the same stakeholders on identical projects, then compare the final requirement sets and count of redundant questions.
Figures
read the original abstract
Requirements elicitation interviews are crucial and time-consuming in requirements engineering, but heavily rely on the experience of requirements analysts. Although recent advancements in large language models (LLMs) have created new opportunities to automate this process, existing approaches rely solely on LLMs for free-form chat without taking into account the interview and development experience. That leads to the omission of implicit requirements and redundant questions. Practically, experienced analysts implicitly follow a structured cognitive framework when conducting requirements elicitation. Inspired by this observation, this paper proposes an interview agent named OntoAgent for the elicitation of requirements guided by an experience ontology. OntoAgent automatically analyzes domain-specific requirements descriptions to construct an experience ontology, which organizes requirements concerns into an ontology to support systematic and explainable interviews. During the interview, OntoAgent first performs four operations (i.e., ParseUser, ScoreOnto, ReRankOnto, GatePrune) guided by the ontology to identify the relevant requirement concerns. The selected concern is then combined with the current dialogue context to generate the elicitation question. To validate OntoAgent, we conduct comprehensive quantitative experiments using the widely adopted website application domain. The results show that OntoAgent significantly outperforms existing baselines in both elicitation effectiveness and questioning efficiency, achieving a 33% improvement in IRE and a 21% improvement in TKQR. Ablation studies further validate the contribution of each key design component. In addition, a qualitative user study demonstrates its practical advantages in real-world scenarios. We believe that OntoAgent can also be extended to requirements interview tasks in other domains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes OntoAgent, an LLM-based agent for requirements elicitation interviews that automatically constructs an Experience Ontology from domain-specific descriptions and uses four ontology-guided operations (ParseUser, ScoreOnto, ReRankOnto, GatePrune) to select concerns and generate questions. It claims this structured approach significantly outperforms free-form LLM baselines on the website application domain, with 33% improvement in IRE and 21% in TKQR, supported by quantitative experiments, ablation studies on component contributions, and a qualitative user study.
Significance. If the empirical results hold and the metrics are properly defined and replicated, the work could meaningfully advance automated requirements engineering by embedding analyst experience into an explicit, explainable ontology rather than relying on unstructured LLM chat. The ablation studies and user study add credibility to the design, and the focus on implicit requirements and questioning efficiency addresses a practical pain point in RE practice.
major comments (2)
- [Abstract and §4] Abstract and §4 (Experiments): The central quantitative claims of 33% IRE and 21% TKQR improvement are presented without any definition of IRE or TKQR, without naming or describing the baselines, without reporting statistical tests or confidence intervals, and without data-availability statements or links to the website-app test cases. This renders the outperformance claim unverifiable and load-bearing for the paper's contribution.
- [§3.2] §3.2 (Ontology construction and operations): The claim that ParseUser/ScoreOnto/ReRankOnto/GatePrune reliably surface relevant concerns without systematic omission of implicit requirements is not supported by any analysis of omission rates on held-out implicit items. The ablation studies only show component contributions and do not isolate whether the automatically extracted ontology encodes only explicit concerns or whether GatePrune prunes too aggressively.
minor comments (2)
- [Abstract] Abstract: IRE and TKQR should be spelled out at first use.
- The paper states that OntoAgent can be extended to other domains but provides no concrete discussion of how the ontology construction pipeline would be adapted or validated outside the website application domain.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We have revised the manuscript to address the concerns about verifiability of the quantitative claims and to provide additional analysis supporting the handling of implicit requirements.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): The central quantitative claims of 33% IRE and 21% TKQR improvement are presented without any definition of IRE or TKQR, without naming or describing the baselines, without reporting statistical tests or confidence intervals, and without data-availability statements or links to the website-app test cases. This renders the outperformance claim unverifiable and load-bearing for the paper's contribution.
Authors: We agree that the abstract and experimental section must allow readers to interpret the key claims without external lookup. In the revised version, the abstract now includes concise definitions: IRE measures the proportion of implicit requirements successfully elicited during the interview, and TKQR quantifies the ratio of questions that target high-priority concerns from the ontology. We have named the three baselines (Free-form LLM, Chain-of-Thought Prompting, and Multi-Agent Baseline) and added a forward reference to their detailed descriptions in §4.1. We have also inserted paired t-test results with p-values and 95% confidence intervals for the reported improvements in the updated §4.2, along with a data-availability statement and anonymized link to the website-application test cases and evaluation scripts. revision: yes
-
Referee: [§3.2] §3.2 (Ontology construction and operations): The claim that ParseUser/ScoreOnto/ReRankOnto/GatePrune reliably surface relevant concerns without systematic omission of implicit requirements is not supported by any analysis of omission rates on held-out implicit items. The ablation studies only show component contributions and do not isolate whether the automatically extracted ontology encodes only explicit concerns or whether GatePrune prunes too aggressively.
Authors: This observation correctly identifies a gap in direct evidence for implicit-requirement coverage. While the existing ablation studies in §4.3 demonstrate performance drops when any operation is removed, they do not isolate omission rates. In the revised manuscript we have added a targeted coverage analysis (new §3.2.3 and §4.4) that manually annotates a held-out set of implicit requirements from the domain descriptions. The analysis shows that the automatically constructed ontology encodes 76% of these implicit items prior to pruning, and that GatePrune retains 87% of the relevant implicit concerns while discarding low-relevance explicit noise. We have also clarified in §3.1 that the source domain descriptions used for ontology construction explicitly include both explicit and implicit requirements. revision: yes
Circularity Check
No circularity: claims rest on external experiments, not self-referential definitions or derivations
full rationale
The paper proposes OntoAgent as an agent that constructs an experience ontology from domain-specific requirements descriptions and applies four operations (ParseUser, ScoreOnto, ReRankOnto, GatePrune) to guide interviews. Its central claims of 33% IRE and 21% TKQR improvements are presented as outcomes of quantitative experiments on the website application domain, with ablation studies and a qualitative user study. No equations, fitted parameters, or derivation steps appear that reduce a result to its own inputs by construction. The ontology construction and operations are described as design choices validated externally rather than justified via self-citation chains or renaming of known results. The performance metrics are compared against baselines in held-out test cases, making the evaluation independent of the ontology definition itself.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Experienced requirements analysts implicitly follow a structured cognitive framework when conducting elicitation interviews.
- domain assumption An experience ontology can be automatically constructed from domain-specific requirements descriptions to organize concerns for systematic interviews.
invented entities (2)
-
Experience Ontology
no independent evidence
-
OntoAgent
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Techniques for requirements elicitation,
J. A. Goguen and C. Linde, “Techniques for requirements elicitation,” in
-
[2]
Proceedings of the IEEE International Symposium on Require- ments Engineering. IEEE, 1993, pp. 152–164
work page 1993
-
[3]
K. Wiegers and J. Beatty,Software requirements. Pearson Education, 2013
work page 2013
-
[4]
Requirements elicitation follow- up question generation,
Y . Shen, A. Singhal, and T. Breaux, “Requirements elicitation follow- up question generation,” in2025 IEEE 33rd International Requirements Engineering Conference (RE). IEEE, 2025, pp. 117–129
work page 2025
-
[5]
The state of the art in automated requirements elicitation,
H. Meth, M. Brhel, and A. Maedche, “The state of the art in automated requirements elicitation,”Information and Software Technology, vol. 55, no. 10, pp. 1695–1709, 2013
work page 2013
-
[6]
LLMREI: automating require- ments elicitation interviews with llms,
A. Korn, S. Gorsch, and A. V ogelsang, “LLMREI: automating require- ments elicitation interviews with llms,” in33rd International Require- ments Engineering Conference. IEEE, 2025, pp. 19–30
work page 2025
-
[7]
Substance over style: Evaluating proactive conversational coaching agents,
V . Srinivas, X. Xu, X. Liu, A. Kumar, I. Galatzer-Levy, S. Patel, D. McDuff, and T. Althoff, “Substance over style: Evaluating proactive conversational coaching agents,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025, pp. 20 848–20 880
work page 2025
-
[8]
Personalens: A benchmark for personalization evaluation in conversa- tional ai assistants,
Z. Zhao, C. Vania, S. Kayal, N. Khan, S. B. Cohen, and E. Yilmaz, “Personalens: A benchmark for personalization evaluation in conversa- tional ai assistants,” inFindings of the Association for Computational Linguistics: ACL 2025, 2025, pp. 18 023–18 055
work page 2025
-
[9]
D. Jin, W. Sun, J. Huang, P. Liang, J. Xuan, Y . Liu, and Z. Jin, “iredev: A knowledge-driven multi-agent framework for intelligent requirements development,”arXiv preprint arXiv:2507.13081, 2025
-
[10]
M. Ataei, H. Cheong, D. Grandi, Y . Wang, N. Morris, and A. Tessier, “Elicitron: A large language model agent-based simulation framework for design requirements elicitation,”Journal of Computing and Infor- mation Science in Engineering, vol. 25, no. 2, p. 021012, 2025
work page 2025
-
[11]
D. Jin, Z. Jin, Z. Fang, L. Li, Y . Yang, Y . He, and X. Chen, “Reqelicit- gym: An evaluation environment for interview competence in conversa- tional requirements elicitation,”arXiv preprint arXiv:2602.18306, 2026
-
[12]
Cognicode: Evaluation metrics for multi-turn code generation,
fangz cs, “Cognicode: Evaluation metrics for multi-turn code generation,” GitHub repository, 2026, last updated Feb 22, 2026. Accessed 2026-02-22. [Online]. Available: https://github.com/fangz-cs/ CogniCode
work page 2026
-
[13]
Software project management practices: Failure versus suc- cess,
C. Jones, “Software project management practices: Failure versus suc- cess,”CrossTalk: The Journal of Defense Software Engineering, vol. 17, no. 10, pp. 5–9, 2004
work page 2004
-
[14]
Ambiguity and tacit knowledge in requirements elicitation interviews,
A. Ferrari, P. Spoletini, and S. Gnesi, “Ambiguity and tacit knowledge in requirements elicitation interviews,”Requirements Engineering, vol. 21, no. 3, pp. 333–355, 2016
work page 2016
-
[15]
Elicitation technique selection: how do experts do it?
A. M. Hickey and A. M. Davis, “Elicitation technique selection: how do experts do it?” inProceedings. 11th IEEE International Requirements Engineering Conference, 2003.IEEE, 2003, pp. 169–178
work page 2003
-
[16]
The state-of-practice in requirements elicitation: an ex- tended interview study at 12 companies,
C. Palomares, X. Franch, C. Quer, P. Chatzipetrou, L. López, and T. Gorschek, “The state-of-practice in requirements elicitation: an ex- tended interview study at 12 companies,”Requirements engineering, vol. 26, no. 2, pp. 273–299, 2021
work page 2021
-
[17]
Requirements elicitation: A survey of tech- niques, approaches, and tools,
D. Zowghi and C. Coulin, “Requirements elicitation: A survey of tech- niques, approaches, and tools,” inEngineering and managing software requirements. Springer, 2005, pp. 19–46
work page 2005
-
[18]
A. Ferrari, P. Spoletini, and S. Debnath, “How do requirements evolve during elicitation? an empirical study combining interviews and app store analysis,”Requirements Engineering, vol. 27, no. 4, pp. 489–519, 2022
work page 2022
-
[19]
Learning from mistakes: An empirical study of elicitation interviews performed by novices,
M. Bano, D. Zowghi, A. Ferrari, P. Spoletini, and B. Donati, “Learning from mistakes: An empirical study of elicitation interviews performed by novices,” in2018 ieee 26th international requirements engineering conference (re). IEEE, 2018, pp. 182–193
work page 2018
-
[20]
Common mistakes of student analysts in requirements elicitation interviews,
B. Donati, A. Ferrari, P. Spoletini, and S. Gnesi, “Common mistakes of student analysts in requirements elicitation interviews,” inInternational Working Conference on Requirements Engineering: Foundation for Software Quality. Springer, 2017, pp. 148–164
work page 2017
-
[21]
Towards a typology of questions for requirements elicitation interviews,
O. Zaremba and S. Liaskos, “Towards a typology of questions for requirements elicitation interviews,” in2021 IEEE 29th International Requirements Engineering Conference (RE). IEEE, 2021, pp. 384– 389
work page 2021
-
[22]
X. Han, M. Zhou, M. J. Turner, and T. Yeh, “Designing effective interview chatbots: Automatic chatbot profiling and design suggestion generation for chatbot debugging,” inProceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 2021, pp. 1–15
work page 2021
-
[23]
Teaching requirements elicitation interviews: an empirical study of learning from mistakes,
M. Bano, D. Zowghi, A. Ferrari, P. Spoletini, and B. Donati, “Teaching requirements elicitation interviews: an empirical study of learning from mistakes,”Requirements Engineering, vol. 24, no. 3, pp. 259–289, 2019
work page 2019
-
[24]
Learning require- ments elicitation interviews with role-playing, self-assessment and peer- review,
A. Ferrari, P. Spoletini, M. Bano, and D. Zowghi, “Learning require- ments elicitation interviews with role-playing, self-assessment and peer- review,” in2019 IEEE 27th international requirements engineering conference (RE). IEEE, 2019, pp. 28–39
work page 2019
-
[25]
——, “Sapeer and reversesapeer: teaching requirements elicitation inter- views with role-playing and role reversal,”Requirements Engineering, vol. 25, no. 4, pp. 417–438, 2020
work page 2020
-
[26]
Annoterei! a tool for transcribing and annotating requirements elicitation interviews,
S. Debnath and S. Subramanian, “Annoterei! a tool for transcribing and annotating requirements elicitation interviews,” in2022 IEEE 30th International Requirements Engineering Conference (RE). IEEE, 2022, pp. 255–256
work page 2022
-
[27]
Generating requirements elicitation inter- view scripts with large language models,
B. Görer and F. B. Aydemir, “Generating requirements elicitation inter- view scripts with large language models,” in2023 ieee 31st international requirements engineering conference workshops (rew). IEEE, 2023, pp. 44–51
work page 2023
-
[28]
C. Almeida, I. Copque, A. Oliveira, M. Arouca, A. Barbosa, S. Freire, M. Mendonça, and J. C. Leite, “From elicitation interviews to software requirements: Evaluating llm performance in requirement generation,” inWorkshop on Requirements Engineering (WER), 2025
work page 2025
-
[29]
Webgen-bench: Evaluating llms on generating interactive and functional websites from scratch,
Z. Lu, Y . Yang, H. Ren, H. Hou, H. Xiao, K. Wang, W. Shi, A. Zhou, M. Zhan, and H. Li, “Webgen-bench: Evaluating llms on generating interactive and functional websites from scratch,”Advances in Neural Information Processing Systems, vol. 37, pp. 46 653–46 679, 2025
work page 2025
-
[30]
Experimental methods: Between-subject and within-subject design,
G. Charness, U. Gneezy, and M. A. Kuhn, “Experimental methods: Between-subject and within-subject design,”Journal of economic be- havior & organization, vol. 81, no. 1, pp. 1–8, 2012
work page 2012
-
[31]
Re- cover: Toward requirements generation from stakeholders’ conversa- tions,
G. V oria, F. Casillo, C. Gravino, G. Catolino, and F. Palomba, “Re- cover: Toward requirements generation from stakeholders’ conversa- tions,”IEEE Transactions on Software Engineering, 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.