An Approach for Reviewing Security-Related Aspects in Agile Requirements Specifications of Web Applications

A. A. Neto; A. Garcia; D. Mendez Fern\'andez; H. Villamizar; M. Kalinowski

arxiv: 1906.11432 · v1 · pith:MABUGSWSnew · submitted 2019-06-27 · 💻 cs.SE

An Approach for Reviewing Security-Related Aspects in Agile Requirements Specifications of Web Applications

H. Villamizar , A. A. Neto , M. Kalinowski , A. Garcia , D. Mendez Fern\'andez This is my paper

Pith reviewed 2026-05-25 15:02 UTC · model grok-4.3

classification 💻 cs.SE

keywords agile requirementssecurity reviewnatural language processingOWASPreading techniqueweb applicationscontrolled experimentdefect detection

0 comments

The pith

An NLP-generated reading technique improves novice inspectors' effectiveness and efficiency when reviewing security aspects in agile web application requirements.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper designs an approach that takes user stories and security specifications as input, applies natural language processing to connect those stories to relevant security properties, then selects matching high-level requirements from the OWASP list and produces a focused reading technique to guide defect detection. It tests the technique through two controlled experiments that compare novice inspectors using the generated guides against inspectors using the complete OWASP list. Results show statistically significant gains in both the proportion of defects found and the speed of review, with very large effect sizes. A sympathetic reader would care because security concerns are frequently missed in agile projects that rely on lightweight documentation and lack specialized expertise, and early detection in requirements can avoid costly fixes later.

Core claim

The central claim is that the approach, which uses NLP to relate user stories to security properties and then derives tailored reading techniques from OWASP high-level security requirements, produces statistically significant improvements in effectiveness and efficiency for novice inspectors verifying security aspects in agile requirements specifications of web applications, with very large effect sizes observed across two experiment trials.

What carries the argument

NLP mapping of user stories to security properties that selects OWASP requirements and generates a focused reading technique for defect detection.

If this is right

Novice inspectors identify a higher proportion of security-related defects when guided by the NLP-derived reading technique.
Review time decreases when inspectors apply the focused technique instead of the complete OWASP list.
The performance gains remain consistent across the two separate controlled experiment trials.
The technique supports verification of only the OWASP requirements that the NLP step links to the given user stories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the NLP mapping proves reliable outside the lab, teams could embed the technique in agile tools to surface security concerns without requiring a dedicated security expert at every review.
The same NLP-to-requirements pipeline might be adapted for other quality attributes such as performance or accessibility once appropriate property lists are defined.
Further trials with professional agile teams on live projects would test whether the measured gains hold when inspectors have domain knowledge and when requirements evolve rapidly.

Load-bearing premise

The NLP techniques correctly map user stories to security properties and the controlled experiment conditions reflect real-world agile reviewing by novice inspectors.

What would settle it

A replication study in which inspectors using the generated reading technique show no statistically significant gain in defect detection rate or review speed compared with those using the full OWASP list.

read the original abstract

Defects in requirements specifications can have severe consequences during the software development lifecycle. Some of them result in overall project failure due to incorrect or missing quality characteristics such as security. There are several concerns that make security difficult to deal with; for instance, (1) when stakeholders discuss general requirements in (review) meetings, they are often not aware that they should also discuss security-related topics, and (2) they typically do not have enough security expertise. These concerns become even more challenging in agile development contexts, where lightweight documentation is typically involved. The goal of this paper is to design and evaluate an approach to support reviewing security-related aspects in agile requirements specifications of web applications. The designed approach considers user stories and security specifications as input and relates those user stories to security properties via Natural Language Processing (NLP) techniques. Based on the related security properties, our approach then identifies high-level security requirements from the Open Web Application Security Project (OWASP) to be verified and generates a focused reading techniques to support reviewers in detecting detects. We evaluate our approach via two controlled experiment trials, comparing the effectiveness and efficiency of novice inspectors verifying security aspects in agile requirements using our reading technique against using the complete list of OWASP high-level security requirements. The (statistically significant) results indicate that using the reading technique has a positive impact (with very large effect size) on the performance of inspectors in terms of effectiveness and efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical NLP pipeline to turn agile user stories into focused OWASP-based security review guides, with experiments showing clear gains for novices, but the unmeasured accuracy of the mapping step leaves the results hard to interpret.

read the letter

The main thing is that they built a method which takes user stories, applies NLP to connect them to security properties, pulls the matching high-level OWASP requirements, and produces a reading technique aimed at security defects in web-app requirements. Two controlled experiments then compared novices using this technique against the full OWASP list and found statistically significant improvements in both effectiveness and efficiency, with very large effect sizes. That combination of NLP mapping plus generated checklist for the agile case is the concrete new piece. The experiments are a plus because they use a baseline and report proper stats rather than just describing the tool. The work is aimed squarely at the real constraint in agile settings where teams lack security depth and keep documentation light. The soft spot is the NLP mapping itself. The abstract gives no precision, recall, or agreement numbers for how well the stories are linked to properties, so if that step is noisy the generated technique is not actually focused and the performance difference becomes difficult to credit to the approach. The trials are also limited to novices and web applications, which matches the stated motivation but narrows how far the findings travel. No obvious circularity or invented results appear in the reported design. This is for requirements-engineering researchers and tool builders who care about lightweight security reviews in agile projects. A reader working on review support or secure agile methods would get usable ideas and data to discuss. It has enough of a working method and empirical results to go to referees.

Referee Report

2 major / 2 minor

Summary. The manuscript presents an approach to support reviewing security aspects in agile requirements for web applications. User stories are processed via NLP techniques to relate them to security properties; corresponding high-level requirements are then selected from OWASP and used to generate focused reading techniques. The approach is evaluated in two controlled experiments with novice inspectors, reporting statistically significant gains in effectiveness and efficiency (with very large effect sizes) when using the generated techniques versus the complete OWASP list.

Significance. If the foundational NLP mapping step can be shown to be accurate on the experimental artifacts, the work would provide quantitative evidence that targeted reading techniques can improve security reviews in agile settings where expertise is limited. The use of controlled trials with statistical analysis is a methodological strength that allows direct comparison to a relevant baseline.

major comments (2)

[Section 3] The description of the approach (Section 3) states that NLP techniques are used to relate user stories to security properties but reports no precision, recall, F1, or inter-rater agreement for this mapping. Because the focused reading technique is constructed directly from the output of this step, the absence of validation metrics leaves open the possibility that the technique is effectively untargeted on the artifacts used in the experiments.
[Section 4] In the experimental evaluation (Section 4), the statistically significant improvements and large effect sizes are presented as evidence for the reading technique. Without accuracy data on the NLP mapping for the specific user stories in the experimental materials, it is not possible to rule out that the observed gains arise from factors other than correct identification of relevant security properties, weakening the central claim.

minor comments (2)

[Abstract] Abstract contains the phrase 'detecting detects'; this appears to be a typographical error for 'detecting defects'.
[Section 3] The paper does not discuss how the NLP component would be maintained or retrained when new security properties or OWASP updates are introduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for validation of the NLP mapping. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Section 3] The description of the approach (Section 3) states that NLP techniques are used to relate user stories to security properties but reports no precision, recall, F1, or inter-rater agreement for this mapping. Because the focused reading technique is constructed directly from the output of this step, the absence of validation metrics leaves open the possibility that the technique is effectively untargeted on the artifacts used in the experiments.

Authors: We agree that reporting accuracy metrics for the NLP mapping on the experimental user stories is necessary to support attribution of the observed gains to correct targeting of security properties. In the revised manuscript we will add to Section 3 a quantitative evaluation of the NLP step (precision, recall, F1) performed on the same artifacts used in the controlled experiments, together with a brief description of how the mapping rules were derived. revision: yes
Referee: [Section 4] In the experimental evaluation (Section 4), the statistically significant improvements and large effect sizes are presented as evidence for the reading technique. Without accuracy data on the NLP mapping for the specific user stories in the experimental materials, it is not possible to rule out that the observed gains arise from factors other than correct identification of relevant security properties, weakening the central claim.

Authors: We acknowledge the concern. The new validation results added to Section 3 will directly address this point by allowing readers to assess how accurately the security properties were identified for the experimental artifacts. With those data in place, the statistically significant gains can be interpreted in light of the actual targeting quality rather than left open to alternative explanations. revision: yes

Circularity Check

0 steps flagged

No circularity; evaluation relies on independent controlled experiments

full rationale

The paper's central claim rests on two controlled experiment trials that directly compare inspector performance (effectiveness and efficiency) when using the generated reading technique versus the complete OWASP list. These trials constitute an external benchmark with novice inspectors and measured outcomes; they do not reduce to the NLP mapping step by construction, nor do they invoke self-citations, fitted parameters renamed as predictions, or uniqueness theorems. The NLP component is an input to technique generation, but the experiment evaluates the resulting technique against a baseline without any self-referential loop or statistical forcing. No load-bearing step matches any enumerated circularity pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the approach relies on existing OWASP lists and standard NLP techniques with no new free parameters or invented entities introduced.

axioms (1)

domain assumption NLP techniques can reliably identify security-related aspects in user stories
Central to relating input stories to security properties before selecting OWASP requirements.

pith-pipeline@v0.9.0 · 5803 in / 1005 out tokens · 27910 ms · 2026-05-25T15:02:15.460801+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

export” and “system

The system shall ensure that there is no residual data exposed. 2. The system shall store credentials securely using the AES encryption algorithm. 3. The system shall use the RSA encryption algorithm to protect all data all the time. 4. The system shall inactivate a session when it exceeds certain periods of inactivity. 5. The system shall encrypt the rol...

work page doi:10.5281/zenodo.2602205 2005
[2]

Finding and fixing problems early: A perspective-based approach to requirements and design inspections

J.C. Carver, F. Shull and I. Rus, “Finding and fixing problems early: A perspective-based approach to requirements and design inspections”. STSC CrossTalk, 2006. [11] L. Chung, B. A. Nixon, E. Yu, J. Mylopoulos. “Non-functional requirements in software engineering.” (Vol. 5). Springer Science & Business Media, 2012. [12] F.D. Davis, “Perceived usefulness,...

work page 2006
[3]

Evidence-based guidelines to defect causal analysis

M. Kalinowski, D.N. Card and G.H. Travassos, “Evidence-based guidelines to defect causal analysis”. IEEE software, 29(4), pp.16-18, 2012. [29] G. Lami, S. Gnesi, F. Fabbrini, M. Fusani and G. rentanni, “An automatic tool for the analysis of natural language requirements”. Informe técnico, CNR Information Science and Technology Institute, Pisa, Italia, Set...

work page 2012

[1] [1]

export” and “system

The system shall ensure that there is no residual data exposed. 2. The system shall store credentials securely using the AES encryption algorithm. 3. The system shall use the RSA encryption algorithm to protect all data all the time. 4. The system shall inactivate a session when it exceeds certain periods of inactivity. 5. The system shall encrypt the rol...

work page doi:10.5281/zenodo.2602205 2005

[2] [2]

Finding and fixing problems early: A perspective-based approach to requirements and design inspections

J.C. Carver, F. Shull and I. Rus, “Finding and fixing problems early: A perspective-based approach to requirements and design inspections”. STSC CrossTalk, 2006. [11] L. Chung, B. A. Nixon, E. Yu, J. Mylopoulos. “Non-functional requirements in software engineering.” (Vol. 5). Springer Science & Business Media, 2012. [12] F.D. Davis, “Perceived usefulness,...

work page 2006

[3] [3]

Evidence-based guidelines to defect causal analysis

M. Kalinowski, D.N. Card and G.H. Travassos, “Evidence-based guidelines to defect causal analysis”. IEEE software, 29(4), pp.16-18, 2012. [29] G. Lami, S. Gnesi, F. Fabbrini, M. Fusani and G. rentanni, “An automatic tool for the analysis of natural language requirements”. Informe técnico, CNR Information Science and Technology Institute, Pisa, Italia, Set...

work page 2012