Why Trust Your Agent? Empirical Security Gains from TRiSM-Guided Agentic Workflows in Healthcare
Pith reviewed 2026-06-30 10:14 UTC · model grok-4.3
The pith
TRiSM-guided agentic workflows cut medical AI attack success from 31% to 10% while raising accuracy 14 points.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Applying the TRiSM framework to an agentic medical report generator produces a least-privilege, defence-in-depth workflow that reduces mean attack success rates from 31% to 10% for RAG poisoning and from 42% to 25% for data-field injection, eliminates the network injection vector through server-side prompt construction, and raises report accuracy by 14 percentage points from 72.5% to 86.5% across five LLMs.
What carries the argument
TRiSM-guided agentic workflow that enforces least-privilege access and moves prompt construction to the server to block client-side injection.
If this is right
- Least-privilege and server-side controls can simultaneously lower attack rates and raise accuracy in agent-based medical tools.
- Model selection becomes a required architectural decision because attack resistance varies across LLMs even under identical TRiSM controls.
- Network injection vectors can be removed entirely by moving prompt construction off the client.
- Defence-in-depth principles scale to agent workflows without sacrificing task performance.
Where Pith is reading between the lines
- The same least-privilege redesign could be tested in other regulated domains such as finance or legal document generation to check whether accuracy-security gains transfer.
- Real deployments would need to measure whether the observed accuracy lift persists when the workflow faces live clinical data rather than simulated cases.
- Future evaluations could add adaptive attackers who modify strategies once they detect server-side controls.
Load-bearing premise
The 500 simulated attacks and 800 generations on five specific LLMs and two report types represent real-world threats and generalize to other models and environments.
What would settle it
A live deployment in which real attackers achieve RAG poisoning success above 10% or data-field injection success above 25% on the TRiSM workflow would falsify the reported security gains.
Figures
read the original abstract
Agent-based AI has enabled the automation of tasks by exposing application tools and resources to large language models (LLMs). However, to improve scope and accuracy, agents are often given access rights that exceed those of ordinary users, introducing significant security risks. AI is routinely integrated into applications with a disregard to security, risking data exposure and breaching regulations. This paper applies the AI Trust, Risk, and Security Management (TRiSM) framework to a medical report-generation application to demonstrate how an insecure agent workflow can be transformed into security-conscious agentic workflow. Both workflows were evaluated across five LLMs (Claude Haiku 4.5, GPT-4.1-nano, GPT-4.1-mini, GPT-5.4-mini, and Gemini 2.5 Flash) on two report types, totalling 800 generations and 500 attack scenarios including RAG poisoning, data-field injection, and client-side network injection. The TRiSM-guided agentic workflow reduced mean attack success rates from 31% to 10% for RAG poisoning and from 42% to 25% for data-field injection, while eliminating the network injection vector entirely through server-side prompt construction. Furthermore, report accuracy increased by 14 percentage points (72.5% to 86.5%) with the agentic workflow, demonstrating a secure design which provides more reliable outputs. This paper contributes to knowledge by demonstrating least-privilege, defence in depth agentic workflows improving security and accuracy, while also highlighting model choice is a necessary architectural consideration.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that applying the TRiSM framework transforms an insecure agentic workflow for medical report generation into a secure one, yielding lower attack success rates (RAG poisoning: 31%→10%; data-field injection: 42%→25%; network injection eliminated) and higher report accuracy (72.5%→86.5%) across five LLMs, two report types, 800 generations, and 500 simulated attacks.
Significance. If the empirical measurements prove robust, the work supplies concrete, multi-model evidence that least-privilege and defense-in-depth agent designs can improve both security and output reliability in a regulated healthcare setting. The explicit comparison of insecure vs. TRiSM-guided workflows and the dual security-accuracy metric are strengths.
major comments (2)
- [Evaluation / Methods] The evaluation description provides only aggregate success rates without the exact attack prompt templates, success criteria, or construction details for the 500 scenarios. This information is load-bearing for the central claim that the observed reductions are attributable to the TRiSM workflow rather than non-adaptive or baseline-specific test artifacts.
- [Results] No per-LLM or per-report-type breakdowns, variance statistics, or hypothesis tests accompany the headline deltas (e.g., 14 pp accuracy gain). Without these, it is impossible to assess whether the reported improvements are consistent or statistically distinguishable from noise.
minor comments (1)
- [Abstract] Model names such as 'GPT-4.1-nano' and 'GPT-5.4-mini' should be replaced with the precise identifiers used in the experiments to enable replication.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the major comments point by point below and will revise the manuscript to improve transparency and statistical rigor.
read point-by-point responses
-
Referee: [Evaluation / Methods] The evaluation description provides only aggregate success rates without the exact attack prompt templates, success criteria, or construction details for the 500 scenarios. This information is load-bearing for the central claim that the observed reductions are attributable to the TRiSM workflow rather than non-adaptive or baseline-specific test artifacts.
Authors: We agree that the absence of these details limits reproducibility. In the revised manuscript we will add the full attack prompt templates (or representative examples for each category), the exact success criteria used for each attack type, and a step-by-step description of scenario construction, including how the 500 attacks were distributed across the two workflows, five LLMs, and two report types. These will appear in an expanded Methods section or as supplementary material. revision: yes
-
Referee: [Results] No per-LLM or per-report-type breakdowns, variance statistics, or hypothesis tests accompany the headline deltas (e.g., 14 pp accuracy gain). Without these, it is impossible to assess whether the reported improvements are consistent or statistically distinguishable from noise.
Authors: We acknowledge that aggregate figures alone are insufficient. The revision will include per-LLM and per-report-type tables (or figures) for both attack success rates and accuracy, report standard deviations or confidence intervals across the 800 generations, and appropriate statistical tests (e.g., McNemar’s test for paired binary outcomes or paired t-tests for accuracy) with p-values to evaluate whether the observed deltas are statistically significant. These additions will be placed in the Results section. revision: yes
Circularity Check
No circularity: purely empirical attack-success and accuracy measurements
full rationale
The paper reports measured attack success rates (31%→10% RAG, 42%→25% data-field) and accuracy lift (72.5%→86.5%) obtained by running 800 generations and 500 fixed attack scenarios on five named LLMs. No equations, fitted parameters, predictions, or self-citations appear in the abstract or described methodology; the outcomes are direct experimental counts on defined test sets rather than quantities derived from the inputs by construction. The evaluation therefore contains no load-bearing step that reduces to its own inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
A research landscape of agentic ai and large language models: Applications, challenges and future directions,
S. Brohi, Q. ul-ain Mastoi, N. Z. Jhanjhi, and T. R. Pillai, “A research landscape of agentic ai and large language models: Applications, challenges and future directions,”Algorithms, vol. 18, 2025. [Online]. Available: https://www.mdpi.com/1999-4893/18/8/499
2025
-
[2]
Is vibe coding safe? benchmarking vulnerability of agent-generated code in real-world tasks,
S. Zhao, D. Wang, K. Zhang, J. Luo, Z. Li, and L. Li, “Is vibe coding safe? benchmarking vulnerability of agent-generated code in real-world tasks,” 2026. [Online]. Available: https://arxiv.org/abs/2512.03262
-
[3]
Ai agents under threat: A survey of key security challenges and future pathways,
Z. Deng, Y . Guo, C. Han, W. Ma, J. Xiong, S. Wen, and Y . Xiang, “Ai agents under threat: A survey of key security challenges and future pathways,”ACM Computing Surveys, vol. 57, 2025
2025
-
[4]
Prompt Injection attack against LLM-integrated Applications
Y . Liu, G. Deng, Y . Li, K. Wang, Z. Wang, X. Wang, T. Zhang, Y . Liu, H. Wang, Y . Zheng, L. Y . Zhang, and Y . Liu, “Prompt injection attack against llm-integrated applications,” 2025. [Online]. Available: https://arxiv.org/abs/2306.05499
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[5]
Optimization-based prompt injection attack to llm-as-a-judge,
J. Shi, Z. Yuan, Y . Liu, Y . Huang, P. Zhou, L. Sun, and N. Z. Gong, “Optimization-based prompt injection attack to llm-as-a-judge,” inCCS 2024 - Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security, 2024
2024
-
[6]
Not what you’ve signed up for: Com- promising real-world llm-integrated applications with indirect prompt injection,
K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Com- promising real-world llm-integrated applications with indirect prompt injection,” inAISec 2023 - Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 2023
2023
-
[7]
What is ai trism?
A. Gomstyn and A. Jonker, “What is ai trism?” 2025. [Online]. Available: https://www.ibm.com/think/topics/ai- trism
2025
-
[8]
Mpib: A benchmark for medical prompt injection attacks and clinical safety in llms,
J. Lee, H. Jang, and K. S. Choi, “Mpib: A benchmark for medical prompt injection attacks and clinical safety in llms,” 2026. [Online]. Available: https://arxiv.org/abs/2602.06268
-
[9]
User inference attacks on llms,
N. Kandpal, K. Pillutla, A. Oprea, P. Kairouz, C. Choquette-Choo, and Z. Xu, “User inference attacks on llms,” inSocially Responsible Language Modelling Research, 2023. [Online]. Available: https://openreview.net/forum?id=R8XHBFJOl5
2023
-
[10]
Deep leakage from gradients,
L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,” inAdvances in Neural Information Processing Systems, vol. 32, 2019
2019
-
[11]
J. X. Morris, W. Zhao, J. T. Chiu, V . Shmatikov, and A. M. Rush, “Language model inversion,” 2023. [Online]. Available: https://arxiv.org/abs/2311.13647
-
[12]
Llm dataset inference: Did you train on my dataset?
P. Maini, H. Jia, N. Papernot, and A. Dziedzic, “Llm dataset inference: Did you train on my dataset?” in Advances in Neural Information Processing Systems, vol. 37, 2024
2024
-
[13]
Membership inference attacks against synthetic health data,
Z. Zhang, C. Yan, and B. A. Malin, “Membership inference attacks against synthetic health data,”Journal of Biomedical Informatics, vol. 125, 2022
2022
-
[14]
Evaluation of inference attack models for deep learning on medical data,
M. Wu, X. Zhang, J. Ding, H. Nguyen, R. Yu, M. Pan, and S. T. Wong, “Evaluation of inference attack models for deep learning on medical data,” 2020. [Online]. Available: https://arxiv.org/abs/2011.00177
-
[15]
Hidden in plain sight: Exploring chat history tampering in interactive language models,
C. Wei, Y . Zhao, Y . Gong, K. Chen, L. Xiang, and S. Zhu, “Hidden in plain sight: Exploring chat history tampering in interactive language models,” 2024. [Online]. Available: https://arxiv.org/abs/2405.20234
-
[16]
Progent: Securing AI Agents with Privilege Control
T. Shi, J. He, Z. Wang, H. Li, L. Wu, W. Guo, and D. Song, “Progent: Programmable privilege control for llm agents,” 2025. [Online]. Available: https://arxiv.org/abs/2504.11703
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[17]
Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,
Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,” inAdvances in Neural Information Processing Systems, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, Eds., vol. 37. Curran Associates, Inc., 2024, pp. 130 185–130 213. [Online]. Available: https:...
2024
-
[18]
Memory poisoning and secure multi-agent systems,
V . Torra and M. Bras-Amor´os, “Memory poisoning and secure multi-agent systems,” 2026. [Online]. Available: https://arxiv.org/abs/2603.20357
-
[19]
Securing ai agents against prompt injection attacks,
B. Ramakrishnan and A. Balaji, “Securing ai agents against prompt injection attacks,” 2025. [Online]. Available: https://arxiv.org/abs/2511.15759
-
[20]
Security threats in agentic ai system,
R. Khan, S. Sarkar, S. K. Mahata, and E. Jose, “Security threats in agentic ai system,” 2024. [Online]. Available: https://arxiv.org/abs/2410.14728
-
[21]
Commercial llm agents are already vulnerable to simple yet dangerous attacks
A. Li, Y . Zhou, V . C. Raghuram, T. Goldstein, and M. Goldblum, “Commercial llm agents are already vulnerable to simple yet dangerous attacks,” 2025. [Online]. Available: https://arxiv.org/abs/2502.08586 14 Empirical Security Gains from TRiSM-Guided Agentic Workflows in Healthcare
-
[22]
Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions
X. Hou, Y . Zhao, S. Wang, and H. Wang, “Model context protocol (mcp): Landscape, security threats, and future research directions,” 2025. [Online]. Available: https://arxiv.org/abs/2503.23278
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[23]
Securing the model context protocol (mcp): Risks, controls, and governance,
H. Errico, J. Ngiam, and S. Sojan, “Securing the model context protocol (mcp): Risks, controls, and governance,” 2025. [Online]. Available: https://arxiv.org/abs/2511.20920
-
[24]
The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems
L. Staufer, K. Feng, K. Wei, L. Bailey, Y . Duan, M. Yang, A. P. Ozisik, S. Casper, and N. Kolt, “The 2025 ai agent index: Documenting technical and safety features of deployed agentic ai systems,” 2026. [Online]. Available: https://arxiv.org/abs/2602.17753
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[25]
Integrating artificial intelligence into public administration: Challenges and vulnerabilities,
A. F. Vatamanu and M. Tofan, “Integrating artificial intelligence into public administration: Challenges and vulnerabilities,”Administrative Sciences, vol. 15, 2025. [Online]. Available: https://www.mdpi.com/2076- 3387/15/4/149
2025
-
[26]
What is vibe coding and when should you use it (or not)?
M. Horvat, “What is vibe coding and when should you use it (or not)?”TechRxiv, vol. 2025, 2025. [Online]. Available: https://www.techrxiv.org/doi/abs/10.36227/techrxiv.175459780.03758839/v1
-
[27]
Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems,
S. Raza, R. Sapkota, M. Karkee, and C. Emmanouilidis, “Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems,”AI Open, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2666651026000069
2026
-
[28]
Operationalizing data minimization for privacy-preserving llm prompting,
J. Zhou, N. Mireshghallah, and T. Li, “Operationalizing data minimization for privacy-preserving llm prompting,” 2025. [Online]. Available: https://arxiv.org/abs/2510.03662
-
[29]
Artificial intelligence trust, risk and security management (ai trism): Frameworks, applications, challenges and future research directions,
A. Habbal, M. K. Ali, and M. A. Abuzaraida, “Artificial intelligence trust, risk and security management (ai trism): Frameworks, applications, challenges and future research directions,”Expert Systems with Applications, vol. 240, p. 122442, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0957417423029445
2024
-
[30]
Ai tips 2.0: A comprehensive framework for operationalizing ai governance,
P. Gupta, “Ai tips 2.0: A comprehensive framework for operationalizing ai governance,” 2026. [Online]. Available: https://arxiv.org/abs/2512.09114
-
[31]
P. P. Ray, “A review of trism frameworks in artificial intelligence systems: Fundamentals, taxonomy, use cases, key challenges and future directions,”Expert Systems, vol. 43, p. e70213, 2026, e70213 EXSY-Jun-25-2337. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/exsy.70213
-
[32]
Ai agents vs. agentic ai: A conceptual taxonomy, applications and challenges,
R. Sapkota, K. I. Roumeliotis, and M. Karkee, “Ai agents vs. agentic ai: A conceptual taxonomy, applications and challenges,”Information Fusion, vol. 126, p. 103599, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1566253525006712
2026
-
[33]
Analysing environmental efficiency in ai for x-ray diagnosis,
L. Kearns, “Analysing environmental efficiency in ai for x-ray diagnosis,”Journal of AI, pp. 37–55, 3 2026. [Online]. Available: https://doi.org/10.61969/jai.1838517
-
[34]
Your rag is unfair: Exposing fairness vulnerabilities in retrieval-augmented generation via backdoor attacks,
G. Bagwe, S. S. Chaturvedi, X. Ma, X. Yuan, K.-C. Wang, and L. E. Zhang, “Your rag is unfair: Exposing fairness vulnerabilities in retrieval-augmented generation via backdoor attacks,” 2025. 15
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.