An Approach for Reviewing Security-Related Aspects in Agile Requirements Specifications of Web Applications
Pith reviewed 2026-05-25 15:02 UTC · model grok-4.3
The pith
An NLP-generated reading technique improves novice inspectors' effectiveness and efficiency when reviewing security aspects in agile web application requirements.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the approach, which uses NLP to relate user stories to security properties and then derives tailored reading techniques from OWASP high-level security requirements, produces statistically significant improvements in effectiveness and efficiency for novice inspectors verifying security aspects in agile requirements specifications of web applications, with very large effect sizes observed across two experiment trials.
What carries the argument
NLP mapping of user stories to security properties that selects OWASP requirements and generates a focused reading technique for defect detection.
If this is right
- Novice inspectors identify a higher proportion of security-related defects when guided by the NLP-derived reading technique.
- Review time decreases when inspectors apply the focused technique instead of the complete OWASP list.
- The performance gains remain consistent across the two separate controlled experiment trials.
- The technique supports verification of only the OWASP requirements that the NLP step links to the given user stories.
Where Pith is reading between the lines
- If the NLP mapping proves reliable outside the lab, teams could embed the technique in agile tools to surface security concerns without requiring a dedicated security expert at every review.
- The same NLP-to-requirements pipeline might be adapted for other quality attributes such as performance or accessibility once appropriate property lists are defined.
- Further trials with professional agile teams on live projects would test whether the measured gains hold when inspectors have domain knowledge and when requirements evolve rapidly.
Load-bearing premise
The NLP techniques correctly map user stories to security properties and the controlled experiment conditions reflect real-world agile reviewing by novice inspectors.
What would settle it
A replication study in which inspectors using the generated reading technique show no statistically significant gain in defect detection rate or review speed compared with those using the full OWASP list.
read the original abstract
Defects in requirements specifications can have severe consequences during the software development lifecycle. Some of them result in overall project failure due to incorrect or missing quality characteristics such as security. There are several concerns that make security difficult to deal with; for instance, (1) when stakeholders discuss general requirements in (review) meetings, they are often not aware that they should also discuss security-related topics, and (2) they typically do not have enough security expertise. These concerns become even more challenging in agile development contexts, where lightweight documentation is typically involved. The goal of this paper is to design and evaluate an approach to support reviewing security-related aspects in agile requirements specifications of web applications. The designed approach considers user stories and security specifications as input and relates those user stories to security properties via Natural Language Processing (NLP) techniques. Based on the related security properties, our approach then identifies high-level security requirements from the Open Web Application Security Project (OWASP) to be verified and generates a focused reading techniques to support reviewers in detecting detects. We evaluate our approach via two controlled experiment trials, comparing the effectiveness and efficiency of novice inspectors verifying security aspects in agile requirements using our reading technique against using the complete list of OWASP high-level security requirements. The (statistically significant) results indicate that using the reading technique has a positive impact (with very large effect size) on the performance of inspectors in terms of effectiveness and efficiency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an approach to support reviewing security aspects in agile requirements for web applications. User stories are processed via NLP techniques to relate them to security properties; corresponding high-level requirements are then selected from OWASP and used to generate focused reading techniques. The approach is evaluated in two controlled experiments with novice inspectors, reporting statistically significant gains in effectiveness and efficiency (with very large effect sizes) when using the generated techniques versus the complete OWASP list.
Significance. If the foundational NLP mapping step can be shown to be accurate on the experimental artifacts, the work would provide quantitative evidence that targeted reading techniques can improve security reviews in agile settings where expertise is limited. The use of controlled trials with statistical analysis is a methodological strength that allows direct comparison to a relevant baseline.
major comments (2)
- [Section 3] The description of the approach (Section 3) states that NLP techniques are used to relate user stories to security properties but reports no precision, recall, F1, or inter-rater agreement for this mapping. Because the focused reading technique is constructed directly from the output of this step, the absence of validation metrics leaves open the possibility that the technique is effectively untargeted on the artifacts used in the experiments.
- [Section 4] In the experimental evaluation (Section 4), the statistically significant improvements and large effect sizes are presented as evidence for the reading technique. Without accuracy data on the NLP mapping for the specific user stories in the experimental materials, it is not possible to rule out that the observed gains arise from factors other than correct identification of relevant security properties, weakening the central claim.
minor comments (2)
- [Abstract] Abstract contains the phrase 'detecting detects'; this appears to be a typographical error for 'detecting defects'.
- [Section 3] The paper does not discuss how the NLP component would be maintained or retrained when new security properties or OWASP updates are introduced.
Simulated Author's Rebuttal
We thank the referee for the constructive comments highlighting the need for validation of the NLP mapping. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Section 3] The description of the approach (Section 3) states that NLP techniques are used to relate user stories to security properties but reports no precision, recall, F1, or inter-rater agreement for this mapping. Because the focused reading technique is constructed directly from the output of this step, the absence of validation metrics leaves open the possibility that the technique is effectively untargeted on the artifacts used in the experiments.
Authors: We agree that reporting accuracy metrics for the NLP mapping on the experimental user stories is necessary to support attribution of the observed gains to correct targeting of security properties. In the revised manuscript we will add to Section 3 a quantitative evaluation of the NLP step (precision, recall, F1) performed on the same artifacts used in the controlled experiments, together with a brief description of how the mapping rules were derived. revision: yes
-
Referee: [Section 4] In the experimental evaluation (Section 4), the statistically significant improvements and large effect sizes are presented as evidence for the reading technique. Without accuracy data on the NLP mapping for the specific user stories in the experimental materials, it is not possible to rule out that the observed gains arise from factors other than correct identification of relevant security properties, weakening the central claim.
Authors: We acknowledge the concern. The new validation results added to Section 3 will directly address this point by allowing readers to assess how accurately the security properties were identified for the experimental artifacts. With those data in place, the statistically significant gains can be interpreted in light of the actual targeting quality rather than left open to alternative explanations. revision: yes
Circularity Check
No circularity; evaluation relies on independent controlled experiments
full rationale
The paper's central claim rests on two controlled experiment trials that directly compare inspector performance (effectiveness and efficiency) when using the generated reading technique versus the complete OWASP list. These trials constitute an external benchmark with novice inspectors and measured outcomes; they do not reduce to the NLP mapping step by construction, nor do they invoke self-citations, fitted parameters renamed as predictions, or uniqueness theorems. The NLP component is an input to technique generation, but the experiment evaluates the resulting technique against a baseline without any self-referential loop or statistical forcing. No load-bearing step matches any enumerated circularity pattern.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption NLP techniques can reliably identify security-related aspects in user stories
Reference graph
Works this paper leans on
-
[1]
The system shall ensure that there is no residual data exposed. 2. The system shall store credentials securely using the AES encryption algorithm. 3. The system shall use the RSA encryption algorithm to protect all data all the time. 4. The system shall inactivate a session when it exceeds certain periods of inactivity. 5. The system shall encrypt the rol...
-
[2]
J.C. Carver, F. Shull and I. Rus, “Finding and fixing problems early: A perspective-based approach to requirements and design inspections”. STSC CrossTalk, 2006. [11] L. Chung, B. A. Nixon, E. Yu, J. Mylopoulos. “Non-functional requirements in software engineering.” (Vol. 5). Springer Science & Business Media, 2012. [12] F.D. Davis, “Perceived usefulness,...
work page 2006
-
[3]
Evidence-based guidelines to defect causal analysis
M. Kalinowski, D.N. Card and G.H. Travassos, “Evidence-based guidelines to defect causal analysis”. IEEE software, 29(4), pp.16-18, 2012. [29] G. Lami, S. Gnesi, F. Fabbrini, M. Fusani and G. rentanni, “An automatic tool for the analysis of natural language requirements”. Informe técnico, CNR Information Science and Technology Institute, Pisa, Italia, Set...
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.