LLM-Based Discovery of Latent Requirements from Stakeholder Conversations: Preliminary Results from Industry

Martin Lochner; Mehrdad Sabetzadeh; Mithila Sivakumar; Shiva Nejati

arxiv: 2606.25867 · v1 · pith:O25KOHPEnew · submitted 2026-06-24 · 💻 cs.SE

LLM-Based Discovery of Latent Requirements from Stakeholder Conversations: Preliminary Results from Industry

Mithila Sivakumar , Martin Lochner , Shiva Nejati , Mehrdad Sabetzadeh This is my paper

Pith reviewed 2026-06-25 19:56 UTC · model grok-4.3

classification 💻 cs.SE

keywords requirements elicitationlatent requirementslarge language modelsstakeholder interviewsuser storiescybersecurity operationstraceability

0 comments

The pith

LENS uses LLMs to extract explicit requirements from stakeholder interviews at 84.4 percent F1-score while surfacing latent requirements judged useful by experts 75 percent of the time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents LENS as a method that processes interview transcripts to pull out requirements explicitly stated by stakeholders and to infer additional latent requirements by combining those statements with details on existing organizational tools and infrastructure. Both kinds of requirements are turned into traceable user stories. Evaluated on twelve real cybersecurity operations interviews, the system matched human extraction of explicit requirements at high accuracy and produced latent requirements that domain experts viewed as automation or time-saving opportunities in three out of four cases. The work addresses the common problem that many needed software capabilities stay unspoken during conversations, so automating their discovery could reduce incomplete requirements sets.

Core claim

LENS analyzes stakeholder interview transcripts to extract explicit requirements and infer additional latent requirements by reasoning over stakeholder statements together with contextual information about organizational tools and infrastructure, with both represented as user stories linked to transcript excerpts for traceability.

What carries the argument

LENS (LLM-Enabled Needs Discovery from Stakeholder Interviews), an LLM pipeline that reasons over transcripts plus organizational context to produce traceable explicit and latent user stories.

If this is right

Requirements elicitation can incorporate automated analysis of full interview transcripts rather than relying only on explicit statements.
Latent requirements become a systematic output instead of items discovered late or missed entirely.
Traceability links from each requirement back to transcript excerpts allow quick validation by engineers or stakeholders.
The same transcript can yield both explicit needs already voiced and additional opportunities tied to current tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the usefulness rate holds across other industries, interview analysis could become a standard early step rather than an ad-hoc review.
Pairing LENS output with system usage logs would let teams test whether the inferred requirements actually reduce workload.
Extending the context fed to the LLM beyond tools to include regulatory or compliance documents might surface additional latent needs.

Load-bearing premise

Expert judgments that latent requirements would save time or enable automation serve as a reliable stand-in for their actual value once implemented in a live system.

What would settle it

A study that implements a sample of the latent requirements, then measures actual usage time or automation gains against the original expert ratings.

Figures

Figures reproduced from arXiv: 2606.25867 by Martin Lochner, Mehrdad Sabetzadeh, Mithila Sivakumar, Shiva Nejati.

**Figure 1.** Figure 1: Example stakeholder interview transcript from eSentire. Redacted text appears in square brackets ([]) and is highlighted [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: User story extracted from the transcript in Fig. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of LENS capabilities of eSentire’s [Knowledge Sharing Platform] and [Workflow Management Platform], while the second is suggested by the capabilities of eSentire’s [Internal Chatbot Assistant]. Both aim to reduce manual assessment workload. Table I further includes a rationale for each requirement, explaining its importance in eSentire’s context and how it could be implemented using existing too… view at source ↗

**Figure 4.** Figure 4: Categories of false positives (FPs) in explicit require [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Key themes in expert feedback on latent requirements. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Stakeholder interviews are an important source of information for requirements elicitation, yet many relevant requirements remain implicit in such conversations. Stakeholders frequently describe workflows, challenges, and operational practices without explicitly articulating the software capabilities that could address them. Recent work has considered the use of LLMs to analyze conversational data and extract requirements from stakeholder interviews. Existing approaches, however, primarily focus on identifying explicitly stated requirements, leaving implicit opportunities largely unexplored. In this paper, we present LENS (LLM-Enabled Needs Discovery from Stakeholder Interviews), an approach that analyzes stakeholder interview transcripts to both extract explicit requirements and infer additional latent requirements. LENS performs this inference by reasoning over stakeholder statements together with contextual information about organizational tools and infrastructure. Both extracted and inferred requirements are represented as user stories and linked to transcript excerpts to ensure traceability. We conduct a preliminary evaluation of LENS using twelve stakeholder interview transcripts collected in an industrial setting involving cybersecurity operations. We show that LENS achieves an average F1-score of 84.4% for extracting explicit requirements, while, on average, 75% of the latent requirements identified by LENS were perceived as providing useful automation or time-saving potential by domain experts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LENS gets decent F1 on explicit requirements from real interviews but the 75% usefulness for latents is just expert perception with no downstream check.

read the letter

The paper's main move is using an LLM to infer latent requirements from stakeholder transcripts by adding context on the organization's tools and infrastructure, then turning both explicit and inferred items into traceable user stories. On twelve cybersecurity interviews it reports 84.4% average F1 for the explicit part and 75% of the inferred ones rated useful for automation or time-saving by domain experts.

The industrial transcripts and the traceability links are the parts that feel grounded. Framing the output as user stories keeps it practical for requirements work, and the explicit contrast with prior LLM extraction papers is stated clearly.

The evaluation for the latent claims is the soft spot. Usefulness is measured only by expert ratings of perceived value; there is no check against implemented features, measured time savings, or any other outcome. The abstract also gives almost no information on prompt design, how the organizational context was supplied, inter-rater reliability, or the exact labeling protocol, so the numbers are hard to assess. It is labeled preliminary, which matches the limited scope.

This is for people working on requirements elicitation tools who want to see an LLM approach tried on real interview data. A reader looking for a worked example or starting point for their own experiments could get value; anyone needing robust evidence that the latent inferences deliver actual benefit will find the current results too thin.

I would send it to peer review. The idea is new enough and the data source is a plus, but the methods and validation sections need to be filled out before it can stand on its own.

Referee Report

2 major / 2 minor

Summary. The paper introduces LENS, an LLM-based approach that processes stakeholder interview transcripts to extract explicit requirements and infer latent requirements, representing both as traceable user stories. Using 12 industrial cybersecurity operations transcripts, it reports an average F1-score of 84.4% on explicit requirements and states that domain experts rated 75% of the inferred latent requirements as providing useful automation or time-saving potential.

Significance. If the evaluation protocol and validation gaps are addressed, the work could contribute to requirements elicitation by demonstrating LLM use for implicit needs in real industrial data, with traceability as a practical strength. The industrial setting and dual focus on explicit/latent extraction differentiate it from prior explicit-only methods, though the preliminary scope and perception-based proxy for latent value constrain broader claims.

major comments (2)

[Evaluation] Evaluation section: The 75% usefulness rating for latent requirements rests entirely on domain-expert perception of automation/time-saving potential, with no follow-up implementation, A/B testing, or measured productivity outcomes in the cybersecurity setting. This leaves the central claim about practical benefit for inferred requirements as an unvalidated proxy.
[Evaluation] Evaluation section: No details are provided on the LLM prompting strategy, how organizational context about tools/infrastructure is supplied, inter-rater reliability for human labels, or the exact protocol used to compute the 84.4% F1 and 75% figures, preventing assessment of the reported metrics.

minor comments (2)

Clarify the exact number of latent requirements generated per transcript and any filtering steps applied before expert review.
The abstract states 'preliminary results' but the manuscript should explicitly discuss threats to validity arising from the small sample of 12 transcripts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our preliminary study. We address each major comment below and will revise the manuscript to improve clarity on methodology and to better frame the scope and limitations of the evaluation.

read point-by-point responses

Referee: [Evaluation] Evaluation section: The 75% usefulness rating for latent requirements rests entirely on domain-expert perception of automation/time-saving potential, with no follow-up implementation, A/B testing, or measured productivity outcomes in the cybersecurity setting. This leaves the central claim about practical benefit for inferred requirements as an unvalidated proxy.

Authors: We agree that the assessment of latent requirements relies on expert perception of potential utility rather than direct implementation or measured outcomes. This aligns with the paper's framing as preliminary results from 12 transcripts. In the revision, we will add an explicit Limitations subsection (in Discussion) that acknowledges this as a proxy measure, discusses the rationale for perception-based validation in early-stage work, and outlines future directions including implementation pilots and productivity metrics. We maintain that the current results provide a useful initial signal for the approach while transparently noting the validation gap. revision: partial
Referee: [Evaluation] Evaluation section: No details are provided on the LLM prompting strategy, how organizational context about tools/infrastructure is supplied, inter-rater reliability for human labels, or the exact protocol used to compute the 84.4% F1 and 75% figures, preventing assessment of the reported metrics.

Authors: We acknowledge that the Evaluation section lacks sufficient methodological detail. In the revised version, we will expand this section (and add an appendix if needed) to fully describe: the LLM prompting strategy and templates; the mechanism for supplying organizational context on tools and infrastructure; inter-rater reliability statistics for the human annotations of explicit requirements; and the precise computation protocol for the F1-score (including how true positives were determined) and the 75% usefulness rating (including the expert rating scale and aggregation method). These additions will support reproducibility and metric assessment. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation of LLM tool on labeled transcripts

full rationale

The paper reports an empirical study: LENS extracts explicit requirements (F1 84.4% vs. human annotations) and infers latent ones whose usefulness is rated by domain experts (75%). No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described method. Results are computed directly from external human labels and expert judgments on the given transcripts; the evaluation chain does not reduce to its own inputs by construction. This is the normal case of a self-contained empirical paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unverified assumption that current LLMs can perform reliable inference of latent requirements when given organizational context; no free parameters or invented entities are introduced.

axioms (1)

domain assumption LLMs can accurately infer latent requirements by reasoning over stakeholder statements together with contextual information about organizational tools and infrastructure
This is the core mechanism described for LENS in the abstract and is required for the latent-requirements component of the claim.

pith-pipeline@v0.9.1-grok · 5746 in / 1276 out tokens · 32226 ms · 2026-06-25T19:56:18.801078+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 8 canonical work pages

[1]

Almeida, I

C. Almeida, I. Copque, A. Oliveira, M. Arouca, A. Barbosa, S. Freire, M. Mendonc ¸a, and J. C. Leite. From elicitation interviews to software requirements: Evaluating LLM performance in requirement generation. InWorkshop on Requirement s Engineering (WER), 2025. https://doi. org/10.29327/1588952.28-12

work page doi:10.29327/1588952.28-12 2025
[2]

Introducing Claude Sonnet 4.5

Anthropic. Introducing Claude Sonnet 4.5. https://www.anthropic.com/ news/claude-sonnet-4-5, 2025

2025
[3]

Claude Sonnet 4.5 API Reference

Anthropic. Claude Sonnet 4.5 API Reference. https://docs.aimlapi.com/ api-references/text-models-llm/anthropic/claude-4-5-sonnet, 2026

2026
[4]

Context windows

Anthropic Claude API Docs. Context windows. https://platform.claude. com/docs/en/build-with-claude/context-windows, 2026

2026
[5]

Burnay, I.J

C. Burnay, I.J. Jureta, and S. Faulkner. What stakeholders will or will not say: A theoretical and empirical study of topic importance in requirements engineering elicitation interviews.Information Systems, 46:61–81, 2014. https://doi.org/10.1016/j.is.2014.05.006

work page doi:10.1016/j.is.2014.05.006 2014
[6]

J. Cohen. A coefficient of agreement for nominal scales.Educational and Psychological Measurement, 20(1):37–46, 1960

1960
[7]

C. Jain, P. R. Anish, A. Singh, and S. Ghaisas. A transformer- based approach for abstractive summarization of requirements from obligations in software engineering contracts. InIEEE 31st International Requirements Engineering Conference (RE), pages 169–179, 2023. https://doi.org/10.1109/RE57278.2023.00025

work page doi:10.1109/re57278.2023.00025 2023
[8]

Lucassen, F

G. Lucassen, F. Dalpiaz, J.M.E. van der Werf, and S. Brinkkemper. Improving Agile Requirements: The Quality User Story Framework and Tool.Requirements engineering, 21(3):383–403, 2016. https://doi.org/ 10.1007/s00766-016-0250-x

work page doi:10.1007/s00766-016-0250-x 2016
[9]

Maiden, A

N. Maiden, A. Gizikis, and S. Robertson. Provoking creativity: Imagine what your requirements could be like.IEEE software, 21(5):68–75,
[10]

https://doi.org/10.1109/MS.2004.1331305

work page doi:10.1109/ms.2004.1331305 2004
[11]

Cordeiro, and Liping Zhao

A. Sharfuddin and T. Breaux. Generative goal modeling. InIEEE 33rd International Requirements Engineering Conference (RE), pages 92–103, 2025. https://doi.org/10.1109/RE63999.2025.00019

work page doi:10.1109/re63999.2025.00019 2025
[12]

Spijkman, X

T. Spijkman, X. de Bondt, F. Dalpiaz, and S. Brinkkemper. Sum- marization of elicitation conversations to locate requirements-relevant information. InInternational Working Conference on Requirements Engineering: Foundation for Software Quality, pages 122–139. Springer,
[13]

https://doi.org/10.1007/978-3-031-29786-1 9

work page doi:10.1007/978-3-031-29786-1
[14]

V oria, F

G. V oria, F. Casillo, C. Gravino, G. Catolino, and F. Palomba. RE- COVER: Toward requirements generation from stakeholders’ conversa- tions.IEEE Transactions on Software Engineering, 51(6):1912–1933,

1912
[15]

https://doi.org/10.1109/TSE.2025.3572056. 9

work page doi:10.1109/tse.2025.3572056 2025

[1] [1]

Almeida, I

C. Almeida, I. Copque, A. Oliveira, M. Arouca, A. Barbosa, S. Freire, M. Mendonc ¸a, and J. C. Leite. From elicitation interviews to software requirements: Evaluating LLM performance in requirement generation. InWorkshop on Requirement s Engineering (WER), 2025. https://doi. org/10.29327/1588952.28-12

work page doi:10.29327/1588952.28-12 2025

[2] [2]

Introducing Claude Sonnet 4.5

Anthropic. Introducing Claude Sonnet 4.5. https://www.anthropic.com/ news/claude-sonnet-4-5, 2025

2025

[3] [3]

Claude Sonnet 4.5 API Reference

Anthropic. Claude Sonnet 4.5 API Reference. https://docs.aimlapi.com/ api-references/text-models-llm/anthropic/claude-4-5-sonnet, 2026

2026

[4] [4]

Context windows

Anthropic Claude API Docs. Context windows. https://platform.claude. com/docs/en/build-with-claude/context-windows, 2026

2026

[5] [5]

Burnay, I.J

C. Burnay, I.J. Jureta, and S. Faulkner. What stakeholders will or will not say: A theoretical and empirical study of topic importance in requirements engineering elicitation interviews.Information Systems, 46:61–81, 2014. https://doi.org/10.1016/j.is.2014.05.006

work page doi:10.1016/j.is.2014.05.006 2014

[6] [6]

J. Cohen. A coefficient of agreement for nominal scales.Educational and Psychological Measurement, 20(1):37–46, 1960

1960

[7] [7]

C. Jain, P. R. Anish, A. Singh, and S. Ghaisas. A transformer- based approach for abstractive summarization of requirements from obligations in software engineering contracts. InIEEE 31st International Requirements Engineering Conference (RE), pages 169–179, 2023. https://doi.org/10.1109/RE57278.2023.00025

work page doi:10.1109/re57278.2023.00025 2023

[8] [8]

Lucassen, F

G. Lucassen, F. Dalpiaz, J.M.E. van der Werf, and S. Brinkkemper. Improving Agile Requirements: The Quality User Story Framework and Tool.Requirements engineering, 21(3):383–403, 2016. https://doi.org/ 10.1007/s00766-016-0250-x

work page doi:10.1007/s00766-016-0250-x 2016

[9] [9]

Maiden, A

N. Maiden, A. Gizikis, and S. Robertson. Provoking creativity: Imagine what your requirements could be like.IEEE software, 21(5):68–75,

[10] [10]

https://doi.org/10.1109/MS.2004.1331305

work page doi:10.1109/ms.2004.1331305 2004

[11] [11]

Cordeiro, and Liping Zhao

A. Sharfuddin and T. Breaux. Generative goal modeling. InIEEE 33rd International Requirements Engineering Conference (RE), pages 92–103, 2025. https://doi.org/10.1109/RE63999.2025.00019

work page doi:10.1109/re63999.2025.00019 2025

[12] [12]

Spijkman, X

T. Spijkman, X. de Bondt, F. Dalpiaz, and S. Brinkkemper. Sum- marization of elicitation conversations to locate requirements-relevant information. InInternational Working Conference on Requirements Engineering: Foundation for Software Quality, pages 122–139. Springer,

[13] [13]

https://doi.org/10.1007/978-3-031-29786-1 9

work page doi:10.1007/978-3-031-29786-1

[14] [14]

V oria, F

G. V oria, F. Casillo, C. Gravino, G. Catolino, and F. Palomba. RE- COVER: Toward requirements generation from stakeholders’ conversa- tions.IEEE Transactions on Software Engineering, 51(6):1912–1933,

1912

[15] [15]

https://doi.org/10.1109/TSE.2025.3572056. 9

work page doi:10.1109/tse.2025.3572056 2025