pith. sign in

arxiv: 2606.28666 · v1 · pith:GAZNCQ3Pnew · submitted 2026-06-27 · 💻 cs.CR · cs.AI

Why Trust Your Agent? Empirical Security Gains from TRiSM-Guided Agentic Workflows in Healthcare

Pith reviewed 2026-06-30 10:14 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords TRiSMagentic workflowsLLM securityhealthcare AIRAG poisoningdata injectionreport generationAI trust
0
0 comments X

The pith

TRiSM-guided agentic workflows cut medical AI attack success from 31% to 10% while raising accuracy 14 points.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests a medical report-generation system to show that standard agent workflows expose LLMs to excessive rights and therefore to poisoning and injection attacks. It redesigns the same system under the TRiSM framework with least-privilege controls and server-side prompt construction, then measures both attack resistance and output quality across five LLMs, two report types, 800 generations, and 500 simulated attacks. The redesigned workflow lowered RAG poisoning success from 31% to 10%, data-field injection from 42% to 25%, and removed network injection entirely. Report accuracy rose from 72.5% to 86.5%. These paired gains demonstrate that explicit security architecture can improve both protection and reliability in agent-based healthcare tools.

Core claim

Applying the TRiSM framework to an agentic medical report generator produces a least-privilege, defence-in-depth workflow that reduces mean attack success rates from 31% to 10% for RAG poisoning and from 42% to 25% for data-field injection, eliminates the network injection vector through server-side prompt construction, and raises report accuracy by 14 percentage points from 72.5% to 86.5% across five LLMs.

What carries the argument

TRiSM-guided agentic workflow that enforces least-privilege access and moves prompt construction to the server to block client-side injection.

If this is right

  • Least-privilege and server-side controls can simultaneously lower attack rates and raise accuracy in agent-based medical tools.
  • Model selection becomes a required architectural decision because attack resistance varies across LLMs even under identical TRiSM controls.
  • Network injection vectors can be removed entirely by moving prompt construction off the client.
  • Defence-in-depth principles scale to agent workflows without sacrificing task performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same least-privilege redesign could be tested in other regulated domains such as finance or legal document generation to check whether accuracy-security gains transfer.
  • Real deployments would need to measure whether the observed accuracy lift persists when the workflow faces live clinical data rather than simulated cases.
  • Future evaluations could add adaptive attackers who modify strategies once they detect server-side controls.

Load-bearing premise

The 500 simulated attacks and 800 generations on five specific LLMs and two report types represent real-world threats and generalize to other models and environments.

What would settle it

A live deployment in which real attackers achieve RAG poisoning success above 10% or data-field injection success above 25% on the TRiSM workflow would falsify the reported security gains.

Figures

Figures reproduced from arXiv: 2606.28666 by Liam Kearns.

Figure 1
Figure 1. Figure 1: Medical application using AI agents to generate medical reports. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: AI agent workflow for medical report generation application. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Agentic AI workflow for medical report generation application. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Injection attack with agent workflow when requests are built client-side. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Injection attack with insecurely implemented agentic workflow when requests are built client-side. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Proposed secure agentic workflow using TRiSM. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
read the original abstract

Agent-based AI has enabled the automation of tasks by exposing application tools and resources to large language models (LLMs). However, to improve scope and accuracy, agents are often given access rights that exceed those of ordinary users, introducing significant security risks. AI is routinely integrated into applications with a disregard to security, risking data exposure and breaching regulations. This paper applies the AI Trust, Risk, and Security Management (TRiSM) framework to a medical report-generation application to demonstrate how an insecure agent workflow can be transformed into security-conscious agentic workflow. Both workflows were evaluated across five LLMs (Claude Haiku 4.5, GPT-4.1-nano, GPT-4.1-mini, GPT-5.4-mini, and Gemini 2.5 Flash) on two report types, totalling 800 generations and 500 attack scenarios including RAG poisoning, data-field injection, and client-side network injection. The TRiSM-guided agentic workflow reduced mean attack success rates from 31% to 10% for RAG poisoning and from 42% to 25% for data-field injection, while eliminating the network injection vector entirely through server-side prompt construction. Furthermore, report accuracy increased by 14 percentage points (72.5% to 86.5%) with the agentic workflow, demonstrating a secure design which provides more reliable outputs. This paper contributes to knowledge by demonstrating least-privilege, defence in depth agentic workflows improving security and accuracy, while also highlighting model choice is a necessary architectural consideration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that applying the TRiSM framework transforms an insecure agentic workflow for medical report generation into a secure one, yielding lower attack success rates (RAG poisoning: 31%→10%; data-field injection: 42%→25%; network injection eliminated) and higher report accuracy (72.5%→86.5%) across five LLMs, two report types, 800 generations, and 500 simulated attacks.

Significance. If the empirical measurements prove robust, the work supplies concrete, multi-model evidence that least-privilege and defense-in-depth agent designs can improve both security and output reliability in a regulated healthcare setting. The explicit comparison of insecure vs. TRiSM-guided workflows and the dual security-accuracy metric are strengths.

major comments (2)
  1. [Evaluation / Methods] The evaluation description provides only aggregate success rates without the exact attack prompt templates, success criteria, or construction details for the 500 scenarios. This information is load-bearing for the central claim that the observed reductions are attributable to the TRiSM workflow rather than non-adaptive or baseline-specific test artifacts.
  2. [Results] No per-LLM or per-report-type breakdowns, variance statistics, or hypothesis tests accompany the headline deltas (e.g., 14 pp accuracy gain). Without these, it is impossible to assess whether the reported improvements are consistent or statistically distinguishable from noise.
minor comments (1)
  1. [Abstract] Model names such as 'GPT-4.1-nano' and 'GPT-5.4-mini' should be replaced with the precise identifiers used in the experiments to enable replication.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below and will revise the manuscript to improve transparency and statistical rigor.

read point-by-point responses
  1. Referee: [Evaluation / Methods] The evaluation description provides only aggregate success rates without the exact attack prompt templates, success criteria, or construction details for the 500 scenarios. This information is load-bearing for the central claim that the observed reductions are attributable to the TRiSM workflow rather than non-adaptive or baseline-specific test artifacts.

    Authors: We agree that the absence of these details limits reproducibility. In the revised manuscript we will add the full attack prompt templates (or representative examples for each category), the exact success criteria used for each attack type, and a step-by-step description of scenario construction, including how the 500 attacks were distributed across the two workflows, five LLMs, and two report types. These will appear in an expanded Methods section or as supplementary material. revision: yes

  2. Referee: [Results] No per-LLM or per-report-type breakdowns, variance statistics, or hypothesis tests accompany the headline deltas (e.g., 14 pp accuracy gain). Without these, it is impossible to assess whether the reported improvements are consistent or statistically distinguishable from noise.

    Authors: We acknowledge that aggregate figures alone are insufficient. The revision will include per-LLM and per-report-type tables (or figures) for both attack success rates and accuracy, report standard deviations or confidence intervals across the 800 generations, and appropriate statistical tests (e.g., McNemar’s test for paired binary outcomes or paired t-tests for accuracy) with p-values to evaluate whether the observed deltas are statistically significant. These additions will be placed in the Results section. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical attack-success and accuracy measurements

full rationale

The paper reports measured attack success rates (31%→10% RAG, 42%→25% data-field) and accuracy lift (72.5%→86.5%) obtained by running 800 generations and 500 fixed attack scenarios on five named LLMs. No equations, fitted parameters, predictions, or self-citations appear in the abstract or described methodology; the outcomes are direct experimental counts on defined test sets rather than quantities derived from the inputs by construction. The evaluation therefore contains no load-bearing step that reduces to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work is an empirical comparison study applying an existing framework; the abstract introduces no new free parameters, mathematical axioms, or postulated entities.

pith-pipeline@v0.9.1-grok · 5813 in / 1221 out tokens · 46135 ms · 2026-06-30T10:14:15.272466+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 19 canonical work pages · 4 internal anchors

  1. [1]

    A research landscape of agentic ai and large language models: Applications, challenges and future directions,

    S. Brohi, Q. ul-ain Mastoi, N. Z. Jhanjhi, and T. R. Pillai, “A research landscape of agentic ai and large language models: Applications, challenges and future directions,”Algorithms, vol. 18, 2025. [Online]. Available: https://www.mdpi.com/1999-4893/18/8/499

  2. [2]

    Is vibe coding safe? benchmarking vulnerability of agent-generated code in real-world tasks,

    S. Zhao, D. Wang, K. Zhang, J. Luo, Z. Li, and L. Li, “Is vibe coding safe? benchmarking vulnerability of agent-generated code in real-world tasks,” 2026. [Online]. Available: https://arxiv.org/abs/2512.03262

  3. [3]

    Ai agents under threat: A survey of key security challenges and future pathways,

    Z. Deng, Y . Guo, C. Han, W. Ma, J. Xiong, S. Wen, and Y . Xiang, “Ai agents under threat: A survey of key security challenges and future pathways,”ACM Computing Surveys, vol. 57, 2025

  4. [4]

    Prompt Injection attack against LLM-integrated Applications

    Y . Liu, G. Deng, Y . Li, K. Wang, Z. Wang, X. Wang, T. Zhang, Y . Liu, H. Wang, Y . Zheng, L. Y . Zhang, and Y . Liu, “Prompt injection attack against llm-integrated applications,” 2025. [Online]. Available: https://arxiv.org/abs/2306.05499

  5. [5]

    Optimization-based prompt injection attack to llm-as-a-judge,

    J. Shi, Z. Yuan, Y . Liu, Y . Huang, P. Zhou, L. Sun, and N. Z. Gong, “Optimization-based prompt injection attack to llm-as-a-judge,” inCCS 2024 - Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security, 2024

  6. [6]

    Not what you’ve signed up for: Com- promising real-world llm-integrated applications with indirect prompt injection,

    K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Com- promising real-world llm-integrated applications with indirect prompt injection,” inAISec 2023 - Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 2023

  7. [7]

    What is ai trism?

    A. Gomstyn and A. Jonker, “What is ai trism?” 2025. [Online]. Available: https://www.ibm.com/think/topics/ai- trism

  8. [8]

    Mpib: A benchmark for medical prompt injection attacks and clinical safety in llms,

    J. Lee, H. Jang, and K. S. Choi, “Mpib: A benchmark for medical prompt injection attacks and clinical safety in llms,” 2026. [Online]. Available: https://arxiv.org/abs/2602.06268

  9. [9]

    User inference attacks on llms,

    N. Kandpal, K. Pillutla, A. Oprea, P. Kairouz, C. Choquette-Choo, and Z. Xu, “User inference attacks on llms,” inSocially Responsible Language Modelling Research, 2023. [Online]. Available: https://openreview.net/forum?id=R8XHBFJOl5

  10. [10]

    Deep leakage from gradients,

    L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,” inAdvances in Neural Information Processing Systems, vol. 32, 2019

  11. [11]

    Language model inversion,

    J. X. Morris, W. Zhao, J. T. Chiu, V . Shmatikov, and A. M. Rush, “Language model inversion,” 2023. [Online]. Available: https://arxiv.org/abs/2311.13647

  12. [12]

    Llm dataset inference: Did you train on my dataset?

    P. Maini, H. Jia, N. Papernot, and A. Dziedzic, “Llm dataset inference: Did you train on my dataset?” in Advances in Neural Information Processing Systems, vol. 37, 2024

  13. [13]

    Membership inference attacks against synthetic health data,

    Z. Zhang, C. Yan, and B. A. Malin, “Membership inference attacks against synthetic health data,”Journal of Biomedical Informatics, vol. 125, 2022

  14. [14]

    Evaluation of inference attack models for deep learning on medical data,

    M. Wu, X. Zhang, J. Ding, H. Nguyen, R. Yu, M. Pan, and S. T. Wong, “Evaluation of inference attack models for deep learning on medical data,” 2020. [Online]. Available: https://arxiv.org/abs/2011.00177

  15. [15]

    Hidden in plain sight: Exploring chat history tampering in interactive language models,

    C. Wei, Y . Zhao, Y . Gong, K. Chen, L. Xiang, and S. Zhu, “Hidden in plain sight: Exploring chat history tampering in interactive language models,” 2024. [Online]. Available: https://arxiv.org/abs/2405.20234

  16. [16]

    Progent: Securing AI Agents with Privilege Control

    T. Shi, J. He, Z. Wang, H. Li, L. Wu, W. Guo, and D. Song, “Progent: Programmable privilege control for llm agents,” 2025. [Online]. Available: https://arxiv.org/abs/2504.11703

  17. [17]

    Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,

    Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,” inAdvances in Neural Information Processing Systems, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, Eds., vol. 37. Curran Associates, Inc., 2024, pp. 130 185–130 213. [Online]. Available: https:...

  18. [18]

    Memory poisoning and secure multi-agent systems,

    V . Torra and M. Bras-Amor´os, “Memory poisoning and secure multi-agent systems,” 2026. [Online]. Available: https://arxiv.org/abs/2603.20357

  19. [19]

    Securing ai agents against prompt injection attacks,

    B. Ramakrishnan and A. Balaji, “Securing ai agents against prompt injection attacks,” 2025. [Online]. Available: https://arxiv.org/abs/2511.15759

  20. [20]

    Security threats in agentic ai system,

    R. Khan, S. Sarkar, S. K. Mahata, and E. Jose, “Security threats in agentic ai system,” 2024. [Online]. Available: https://arxiv.org/abs/2410.14728

  21. [21]

    Commercial llm agents are already vulnerable to simple yet dangerous attacks

    A. Li, Y . Zhou, V . C. Raghuram, T. Goldstein, and M. Goldblum, “Commercial llm agents are already vulnerable to simple yet dangerous attacks,” 2025. [Online]. Available: https://arxiv.org/abs/2502.08586 14 Empirical Security Gains from TRiSM-Guided Agentic Workflows in Healthcare

  22. [22]

    Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

    X. Hou, Y . Zhao, S. Wang, and H. Wang, “Model context protocol (mcp): Landscape, security threats, and future research directions,” 2025. [Online]. Available: https://arxiv.org/abs/2503.23278

  23. [23]

    Securing the model context protocol (mcp): Risks, controls, and governance,

    H. Errico, J. Ngiam, and S. Sojan, “Securing the model context protocol (mcp): Risks, controls, and governance,” 2025. [Online]. Available: https://arxiv.org/abs/2511.20920

  24. [24]

    The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems

    L. Staufer, K. Feng, K. Wei, L. Bailey, Y . Duan, M. Yang, A. P. Ozisik, S. Casper, and N. Kolt, “The 2025 ai agent index: Documenting technical and safety features of deployed agentic ai systems,” 2026. [Online]. Available: https://arxiv.org/abs/2602.17753

  25. [25]

    Integrating artificial intelligence into public administration: Challenges and vulnerabilities,

    A. F. Vatamanu and M. Tofan, “Integrating artificial intelligence into public administration: Challenges and vulnerabilities,”Administrative Sciences, vol. 15, 2025. [Online]. Available: https://www.mdpi.com/2076- 3387/15/4/149

  26. [26]

    What is vibe coding and when should you use it (or not)?

    M. Horvat, “What is vibe coding and when should you use it (or not)?”TechRxiv, vol. 2025, 2025. [Online]. Available: https://www.techrxiv.org/doi/abs/10.36227/techrxiv.175459780.03758839/v1

  27. [27]

    Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems,

    S. Raza, R. Sapkota, M. Karkee, and C. Emmanouilidis, “Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems,”AI Open, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2666651026000069

  28. [28]

    Operationalizing data minimization for privacy-preserving llm prompting,

    J. Zhou, N. Mireshghallah, and T. Li, “Operationalizing data minimization for privacy-preserving llm prompting,” 2025. [Online]. Available: https://arxiv.org/abs/2510.03662

  29. [29]

    Artificial intelligence trust, risk and security management (ai trism): Frameworks, applications, challenges and future research directions,

    A. Habbal, M. K. Ali, and M. A. Abuzaraida, “Artificial intelligence trust, risk and security management (ai trism): Frameworks, applications, challenges and future research directions,”Expert Systems with Applications, vol. 240, p. 122442, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0957417423029445

  30. [30]

    Ai tips 2.0: A comprehensive framework for operationalizing ai governance,

    P. Gupta, “Ai tips 2.0: A comprehensive framework for operationalizing ai governance,” 2026. [Online]. Available: https://arxiv.org/abs/2512.09114

  31. [31]

    A review of trism frameworks in artificial intelligence systems: Fundamentals, taxonomy, use cases, key challenges and future directions,

    P. P. Ray, “A review of trism frameworks in artificial intelligence systems: Fundamentals, taxonomy, use cases, key challenges and future directions,”Expert Systems, vol. 43, p. e70213, 2026, e70213 EXSY-Jun-25-2337. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/exsy.70213

  32. [32]

    Ai agents vs. agentic ai: A conceptual taxonomy, applications and challenges,

    R. Sapkota, K. I. Roumeliotis, and M. Karkee, “Ai agents vs. agentic ai: A conceptual taxonomy, applications and challenges,”Information Fusion, vol. 126, p. 103599, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1566253525006712

  33. [33]

    Analysing environmental efficiency in ai for x-ray diagnosis,

    L. Kearns, “Analysing environmental efficiency in ai for x-ray diagnosis,”Journal of AI, pp. 37–55, 3 2026. [Online]. Available: https://doi.org/10.61969/jai.1838517

  34. [34]

    Your rag is unfair: Exposing fairness vulnerabilities in retrieval-augmented generation via backdoor attacks,

    G. Bagwe, S. S. Chaturvedi, X. Ma, X. Yuan, K.-C. Wang, and L. E. Zhang, “Your rag is unfair: Exposing fairness vulnerabilities in retrieval-augmented generation via backdoor attacks,” 2025. 15