pith. sign in

arxiv: 2605.17163 · v1 · pith:5K3EYPRJnew · submitted 2026-05-16 · 💻 cs.CR · cs.AI

STRIDE-AI: A Threat Modeling Framework for Generative AI Security Assessment

Pith reviewed 2026-05-20 14:10 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords threat modelinggenerative AI securitySTRIDE adaptationLLM vulnerabilitiesAI risk assessmentadversarial attackssecurity lifecycleblack-box assessment
0
0 comments X

The pith

STRIDE-AI adapts classical threat modeling to generative AI and cuts attack success rates from 80 percent to 15 percent in a case study.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Traditional cybersecurity methods target deterministic systems and therefore leave generative AI open to attacks that exploit probabilistic outputs such as prompt injection, data poisoning, and model inversion. The paper presents STRIDE-AI as a bridge that connects high-level risk guidelines with concrete vulnerability lists by defining a repeatable six-phase assessment lifecycle. The approach modifies the classical STRIDE categories to fit AI systems and supplies a web tool that makes the method operational. In a black-box evaluation of one deployed LLM chatbot, the structured process lowered the rate of successful attacks from 80 percent to 15 percent. A reader would care because most organizations currently lack any dedicated security strategy while reported adversarial incidents continue to rise.

Core claim

The paper claims that STRIDE-AI supplies a six-phase assessment lifecycle and an AI-adapted version of classical STRIDE threat categories that together connect high-level risk standards to technical vulnerability taxonomies; the framework is implemented through a purpose-built web tool, and a black-box assessment of a deployed LLM chatbot showed that the method reduced attack success rate from 80 percent to 15 percent inside the sandbox environment.

What carries the argument

The six-phase assessment lifecycle together with the AI-specific adaptation of the STRIDE threat categories, made usable by a dedicated web tool.

If this is right

  • Organizations gain a repeatable process to map broad risk standards directly onto specific AI attack surfaces such as prompt injection and data poisoning.
  • The web tool allows security teams to conduct consistent evaluations at multiple stages of AI system development and deployment.
  • The adapted STRIDE categories make it possible to identify threats that arise from the probabilistic rather than deterministic character of generative models.
  • The reported reduction in attack success rate indicates that structured threat modeling can materially lower exposure when applied to deployed LLM chatbots.
  • The framework supplies a concrete way to move from abstract guidelines to actionable technical controls for generative AI.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the six-phase lifecycle works reliably on additional systems, it could become a practical template for standardizing security reviews across the generative AI industry.
  • The same adaptation of STRIDE categories might apply to other probabilistic models beyond large language models, such as diffusion or reinforcement-learning agents.
  • Embedding the web tool into existing development pipelines would let teams catch AI-specific vulnerabilities earlier in the release cycle.
  • Testing the framework in production environments rather than sandboxes would reveal whether the observed reduction persists under real usage and evolving attack techniques.

Load-bearing premise

A single black-box assessment of one deployed LLM chatbot inside a sandbox environment is sufficient to demonstrate that the six-phase lifecycle and adapted STRIDE method produce the reported reduction and can be generalized to other generative AI systems.

What would settle it

Repeating the black-box assessment with a different generative AI system or outside the sandbox setting and finding that the attack success rate does not drop to a comparable level would show the central claim does not hold generally.

Figures

Figures reproduced from arXiv: 2605.17163 by Franziska Schwarz, Tsafac Nkombong Regine Cyrille.

Figure 2
Figure 2. Figure 2: AI Risk Scoring Matrix. Scores ≥20 are Critical. Garak [14] for alignment testing by probing model endpoints with known jailbreak payloads. B. Risk Assessment Our risk scoring follows the standard formula R = L × I, consistent with ISO 27005 [15]. The contribution is the domain￾specific calibration of the scales for AI. Likelihood (L, 1– 5) reflects the knowledge asymmetry unique to AI exploits: L=1 for at… view at source ↗
Figure 3
Figure 3. Figure 3: The tool’s five-layer AI security architecture. Each layer decomposes into specific components with associated attack vectors. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Payload splitting attack reconstructed from case study results (cf. [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: RAG Security Audit results. Top: the model executes injected commands [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
read the original abstract

Traditional cybersecurity methodologies target deterministic systems and fail to address the probabilistic nature of AI, leaving systems vulnerable to attack vectors such as model inversion, data poisoning, and prompt injection. Recent industry reports indicate that a majority of organizations deploying AI lack a dedicated security strategy, with adversarial attacks increasing rapidly year-over-year. We present \textit{STRIDE-AI}, a framework that bridges the gap between high-level risk standards (NIST AI RMF) and technical vulnerability taxonomies (OWASP LLM Top 10). The framework defines a six-phase assessment lifecycle, introduces a threat modeling adaptation of classical STRIDE for AI systems, and is operationalized through a purpose-built web tool. We provide an initial validation of the approach through a black-box assessment of a deployed LLM chatbot, which successfully reduced the attack success rate from 80\% to 15\% in our sandbox case study.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes STRIDE-AI, a threat modeling framework for generative AI security assessment. It bridges high-level risk standards such as the NIST AI RMF with technical vulnerability taxonomies like the OWASP LLM Top 10 by defining a six-phase assessment lifecycle, adapting the classical STRIDE model to AI-specific threats (e.g., prompt injection, model inversion), and operationalizing the approach via a purpose-built web tool. An initial validation is presented through a black-box assessment of a deployed LLM chatbot in a sandbox environment, reporting a reduction in attack success rate from 80% to 15%.

Significance. If the reported reduction can be replicated with fuller experimental controls and extended to additional systems, the framework would provide a concrete, actionable bridge between abstract risk management guidelines and low-level attack vectors, addressing a documented gap in organizational AI security practices. The inclusion of a web tool adds practical value for adoption, and the adaptation of STRIDE represents a reasonable extension of established methods to probabilistic AI systems.

major comments (2)
  1. [Case Study / Validation] The validation reports an attack success rate reduction from 80% to 15% in the sandbox case study, but supplies no details on the attack corpus size, the distribution of prompts across threat categories, the precise baseline configuration prior to framework application, or any statistical tests or controls. This information is required to isolate the contribution of the six-phase lifecycle and adapted STRIDE from possible confounding factors and to support claims of generalizability to other generative AI systems.
  2. [Framework Description] The six-phase assessment lifecycle is presented at a high level without an explicit mapping or example showing how the adapted STRIDE categories are applied within each phase or how they integrate with NIST AI RMF and OWASP LLM Top 10 elements. A concrete workflow diagram or worked example would be needed to make the framework replicable.
minor comments (1)
  1. [Introduction] The introduction cites industry reports on the lack of dedicated AI security strategies but would benefit from specific references to those reports for traceability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving the rigor of the validation and the clarity of the framework description. We address each point below and outline the revisions we will make.

read point-by-point responses
  1. Referee: The validation reports an attack success rate reduction from 80% to 15% in the sandbox case study, but supplies no details on the attack corpus size, the distribution of prompts across threat categories, the precise baseline configuration prior to framework application, or any statistical tests or controls. This information is required to isolate the contribution of the six-phase lifecycle and adapted STRIDE from possible confounding factors and to support claims of generalizability to other generative AI systems.

    Authors: We agree that the current presentation of the case study is high-level and lacks the requested experimental details. The validation was conducted as an initial proof-of-concept in a controlled sandbox to demonstrate feasibility rather than as a comprehensive controlled experiment. In the revised manuscript we will expand the case study section to report the attack corpus size, the distribution of prompts across the adapted STRIDE-AI categories, the exact baseline configuration, and any statistical measures that were applied. We will also add an explicit discussion of limitations and the preliminary nature of the results to avoid overclaiming generalizability. revision: yes

  2. Referee: The six-phase assessment lifecycle is presented at a high level without an explicit mapping or example showing how the adapted STRIDE categories are applied within each phase or how they integrate with NIST AI RMF and OWASP LLM Top 10 elements. A concrete workflow diagram or worked example would be needed to make the framework replicable.

    Authors: We accept that an explicit mapping and concrete example are needed for replicability. We will add a workflow diagram that shows how each adapted STRIDE category is instantiated within the six phases and how the phases connect to specific NIST AI RMF functions and OWASP LLM Top 10 items. We will also include a worked example that walks through the application of the framework on a representative generative AI system, making the integration explicit. revision: yes

Circularity Check

0 steps flagged

No significant circularity in STRIDE-AI framework construction

full rationale

The paper presents STRIDE-AI as a constructive adaptation of classical STRIDE threat modeling to generative AI, defining a six-phase assessment lifecycle that maps high-level NIST AI RMF standards to technical OWASP LLM Top 10 vulnerabilities. This mapping and the purpose-built web tool are introduced as new organizational constructs rather than derived from fitted parameters, self-referential definitions, or load-bearing self-citations. The single black-box sandbox case study reports an empirical reduction in attack success rate but does not reduce the framework's claims to its own inputs by construction; the result is presented as initial validation of the approach, not a prediction forced by prior fits or author-specific uniqueness theorems. No equations, ansatzes smuggled via citation, or renamings of known results appear in the derivation chain. The framework remains self-contained against external benchmarks such as existing standards and taxonomies.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper introduces a new assessment framework by adapting existing standards and threat categories rather than deriving new quantities from data or first principles. No free parameters or invented entities are described.

axioms (1)
  • domain assumption Traditional cybersecurity methodologies target deterministic systems and fail to address the probabilistic nature of AI.
    This premise is stated directly in the abstract as the motivation for the new framework.

pith-pipeline@v0.9.0 · 5681 in / 1511 out tokens · 65319 ms · 2026-05-20T14:10:30.018129+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

  1. [1]

    Explaining and harnessing adversarial examples,

    I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” inProceedings of the International Conference on Learning Representations (ICLR), 2015

  2. [2]

    Towards evaluating the robustness of neural networks,

    N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” inProceedings of the IEEE Symposium on Security and Privacy, 2017, pp. 39–57

  3. [3]

    Regulation (eu) 2024/1689 laying down harmonised rules on artificial intelligence (artificial intelligence act),

    European Parliament and Council of the European Union, “Regulation (eu) 2024/1689 laying down harmonised rules on artificial intelligence (artificial intelligence act),” Official Journal of the European Union, 2024

  4. [4]

    Ai threat landscape report 2025,

    HiddenLayer, Inc., “Ai threat landscape report 2025,” HiddenLayer, Inc., Tech. Rep., 2025. [Online]. Available: https://www.hiddenlayer.com/news/ hiddenlayer-ai-threat-landscape-report-reveals-ai-breaches-on-the-rise

  5. [5]

    Ai risk management framework (ai rmf 1.0),

    National Institute of Standards and Technology, “Ai risk management framework (ai rmf 1.0),” U.S. Department of Commerce, Tech. Rep., 2023

  6. [6]

    Adversarial threat landscape for artificial-intelligence systems (atlas),

    MITRE Corp., “Adversarial threat landscape for artificial-intelligence systems (atlas),” MITRE, Tech. Rep., 2024. [Online]. Available: https://atlas.mitre.org

  7. [7]

    Top 10 for large language model applications,

    OWASP Foundation, “Top 10 for large language model applications,” https://genai.owasp.org/llm-top-10/, 2024, version 2025 Release

  8. [8]

    Secure ai framework (saif),

    Google, “Secure ai framework (saif),” Google Cybersecurity Action Team, Tech. Rep., 2023. [Online]. Available: https://safety.google/ cybersecurity-advancements/saif/

  9. [9]

    Ai red team building blocks,

    Microsoft Security Response Center, “Ai red team building blocks,” Microsoft Corporation, Tech. Rep., 2024. [Online]. Available: https://learn.microsoft.com/en-us/security/ai-red-teaming

  10. [10]

    Raj,Engineering MLOps: Rapidly build, test, and manage production- ready machine learning life cycles

    E. Raj,Engineering MLOps: Rapidly build, test, and manage production- ready machine learning life cycles. Packt Publishing, 2021

  11. [11]

    Pentest++: Elevating ethical hacking with ai and automation,

    H. S. Al-Sinani and C. J. Mitchell, “Pentest++: Elevating ethical hacking with ai and automation,” 2025. [Online]. Available: https: //arxiv.org/abs/2502.09484

  12. [12]

    Shostack,Threat Modeling: Designing for Security

    A. Shostack,Threat Modeling: Designing for Security. John Wiley & Sons, 2014

  13. [13]

    Papernot, N., McDaniel, P

    M.-I. Nicolae, M. Sinn, M. N. Tranet al., “Adversarial robustness toolbox v1.0.0,”arXiv preprint arXiv:1807.01069, 2018

  14. [14]

    garak: Llm vulnerability scanner,

    NVIDIA, “garak: Llm vulnerability scanner,” https://github.com/NVIDIA/ garak, 2024

  15. [15]

    Iso/iec 27005:2022 – information security, cybersecurity and privacy protection – guidance on managing information security risks,

    ISO/IEC, “Iso/iec 27005:2022 – information security, cybersecurity and privacy protection – guidance on managing information security risks,” International Organization for Standardization, 2022

  16. [16]

    Iso/iec fdis 27090: Cybersecurity — artificial intelligence — guidance for addressing security threats and compromises to artificial intelligence systems,

    ISO/IEC JTC 1/SC 42, “Iso/iec fdis 27090: Cybersecurity — artificial intelligence — guidance for addressing security threats and compromises to artificial intelligence systems,” International Organization for Stan- dardization, Tech. Rep., 2024, final Draft International Standard (Under Development)

  17. [17]

    Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection,

    K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection,” inProceedings of the ACM Workshop on Artificial Intelligence and Security (AISec), 2023