Why Trust Your Agent? Empirical Security Gains from TRiSM-Guided Agentic Workflows in Healthcare

Liam Kearns

arxiv: 2606.28666 · v1 · pith:GAZNCQ3Pnew · submitted 2026-06-27 · 💻 cs.CR · cs.AI

Why Trust Your Agent? Empirical Security Gains from TRiSM-Guided Agentic Workflows in Healthcare

Liam Kearns This is my paper

Pith reviewed 2026-06-30 10:14 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords TRiSMagentic workflowsLLM securityhealthcare AIRAG poisoningdata injectionreport generationAI trust

0 comments

The pith

TRiSM-guided agentic workflows cut medical AI attack success from 31% to 10% while raising accuracy 14 points.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests a medical report-generation system to show that standard agent workflows expose LLMs to excessive rights and therefore to poisoning and injection attacks. It redesigns the same system under the TRiSM framework with least-privilege controls and server-side prompt construction, then measures both attack resistance and output quality across five LLMs, two report types, 800 generations, and 500 simulated attacks. The redesigned workflow lowered RAG poisoning success from 31% to 10%, data-field injection from 42% to 25%, and removed network injection entirely. Report accuracy rose from 72.5% to 86.5%. These paired gains demonstrate that explicit security architecture can improve both protection and reliability in agent-based healthcare tools.

Core claim

Applying the TRiSM framework to an agentic medical report generator produces a least-privilege, defence-in-depth workflow that reduces mean attack success rates from 31% to 10% for RAG poisoning and from 42% to 25% for data-field injection, eliminates the network injection vector through server-side prompt construction, and raises report accuracy by 14 percentage points from 72.5% to 86.5% across five LLMs.

What carries the argument

TRiSM-guided agentic workflow that enforces least-privilege access and moves prompt construction to the server to block client-side injection.

If this is right

Least-privilege and server-side controls can simultaneously lower attack rates and raise accuracy in agent-based medical tools.
Model selection becomes a required architectural decision because attack resistance varies across LLMs even under identical TRiSM controls.
Network injection vectors can be removed entirely by moving prompt construction off the client.
Defence-in-depth principles scale to agent workflows without sacrificing task performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same least-privilege redesign could be tested in other regulated domains such as finance or legal document generation to check whether accuracy-security gains transfer.
Real deployments would need to measure whether the observed accuracy lift persists when the workflow faces live clinical data rather than simulated cases.
Future evaluations could add adaptive attackers who modify strategies once they detect server-side controls.

Load-bearing premise

The 500 simulated attacks and 800 generations on five specific LLMs and two report types represent real-world threats and generalize to other models and environments.

What would settle it

A live deployment in which real attackers achieve RAG poisoning success above 10% or data-field injection success above 25% on the TRiSM workflow would falsify the reported security gains.

Figures

Figures reproduced from arXiv: 2606.28666 by Liam Kearns.

**Figure 2.** Figure 2: AI agent workflow for medical report generation application. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Agentic AI workflow for medical report generation application. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Injection attack with agent workflow when requests are built client-side. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Injection attack with insecurely implemented agentic workflow when requests are built client-side. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Proposed secure agentic workflow using TRiSM. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

read the original abstract

Agent-based AI has enabled the automation of tasks by exposing application tools and resources to large language models (LLMs). However, to improve scope and accuracy, agents are often given access rights that exceed those of ordinary users, introducing significant security risks. AI is routinely integrated into applications with a disregard to security, risking data exposure and breaching regulations. This paper applies the AI Trust, Risk, and Security Management (TRiSM) framework to a medical report-generation application to demonstrate how an insecure agent workflow can be transformed into security-conscious agentic workflow. Both workflows were evaluated across five LLMs (Claude Haiku 4.5, GPT-4.1-nano, GPT-4.1-mini, GPT-5.4-mini, and Gemini 2.5 Flash) on two report types, totalling 800 generations and 500 attack scenarios including RAG poisoning, data-field injection, and client-side network injection. The TRiSM-guided agentic workflow reduced mean attack success rates from 31% to 10% for RAG poisoning and from 42% to 25% for data-field injection, while eliminating the network injection vector entirely through server-side prompt construction. Furthermore, report accuracy increased by 14 percentage points (72.5% to 86.5%) with the agentic workflow, demonstrating a secure design which provides more reliable outputs. This paper contributes to knowledge by demonstrating least-privilege, defence in depth agentic workflows improving security and accuracy, while also highlighting model choice is a necessary architectural consideration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reports measurable drops in attack success and a 14-point accuracy gain when TRiSM is applied to agentic medical report workflows, but the attack evaluations rest on fixed simulated scenarios whose construction details are not visible in the abstract.

read the letter

The work takes the TRiSM framework and applies it to an agentic medical report generator, comparing an insecure baseline workflow against a least-privilege, server-side version. They tested five LLMs on two report types for a total of 800 generations and 500 attack runs covering RAG poisoning, data-field injection, and client-side network injection.

The concrete numbers are the main takeaway: mean attack success fell from 31% to 10% on RAG poisoning and from 42% to 25% on data-field injection, the network vector disappeared, and accuracy rose from 72.5% to 86.5%. Running the same tasks across multiple models and reporting both security and quality metrics is more useful than most framework papers that stop at qualitative claims.

The limitation is on the attack side. The abstract gives only aggregate success rates with no per-model breakdown, no statistical tests, and no description of how the 500 scenarios were constructed or whether they included adaptive or optimized attacks. If the baseline prompts or tool interfaces inadvertently made the insecure version easier to break, the measured deltas would overstate the real gain. The claim that model choice is an architectural consideration is noted but not explored in depth.

This is the sort of paper that matters for teams shipping regulated agentic systems who need before-and-after numbers rather than another framework diagram. It is worth sending to review so the methods and attack implementations can be checked directly; the quantitative framing is solid enough to justify referee time even if the attack realism needs tightening.

Referee Report

2 major / 1 minor

Summary. The paper claims that applying the TRiSM framework transforms an insecure agentic workflow for medical report generation into a secure one, yielding lower attack success rates (RAG poisoning: 31%→10%; data-field injection: 42%→25%; network injection eliminated) and higher report accuracy (72.5%→86.5%) across five LLMs, two report types, 800 generations, and 500 simulated attacks.

Significance. If the empirical measurements prove robust, the work supplies concrete, multi-model evidence that least-privilege and defense-in-depth agent designs can improve both security and output reliability in a regulated healthcare setting. The explicit comparison of insecure vs. TRiSM-guided workflows and the dual security-accuracy metric are strengths.

major comments (2)

[Evaluation / Methods] The evaluation description provides only aggregate success rates without the exact attack prompt templates, success criteria, or construction details for the 500 scenarios. This information is load-bearing for the central claim that the observed reductions are attributable to the TRiSM workflow rather than non-adaptive or baseline-specific test artifacts.
[Results] No per-LLM or per-report-type breakdowns, variance statistics, or hypothesis tests accompany the headline deltas (e.g., 14 pp accuracy gain). Without these, it is impossible to assess whether the reported improvements are consistent or statistically distinguishable from noise.

minor comments (1)

[Abstract] Model names such as 'GPT-4.1-nano' and 'GPT-5.4-mini' should be replaced with the precise identifiers used in the experiments to enable replication.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below and will revise the manuscript to improve transparency and statistical rigor.

read point-by-point responses

Referee: [Evaluation / Methods] The evaluation description provides only aggregate success rates without the exact attack prompt templates, success criteria, or construction details for the 500 scenarios. This information is load-bearing for the central claim that the observed reductions are attributable to the TRiSM workflow rather than non-adaptive or baseline-specific test artifacts.

Authors: We agree that the absence of these details limits reproducibility. In the revised manuscript we will add the full attack prompt templates (or representative examples for each category), the exact success criteria used for each attack type, and a step-by-step description of scenario construction, including how the 500 attacks were distributed across the two workflows, five LLMs, and two report types. These will appear in an expanded Methods section or as supplementary material. revision: yes
Referee: [Results] No per-LLM or per-report-type breakdowns, variance statistics, or hypothesis tests accompany the headline deltas (e.g., 14 pp accuracy gain). Without these, it is impossible to assess whether the reported improvements are consistent or statistically distinguishable from noise.

Authors: We acknowledge that aggregate figures alone are insufficient. The revision will include per-LLM and per-report-type tables (or figures) for both attack success rates and accuracy, report standard deviations or confidence intervals across the 800 generations, and appropriate statistical tests (e.g., McNemar’s test for paired binary outcomes or paired t-tests for accuracy) with p-values to evaluate whether the observed deltas are statistically significant. These additions will be placed in the Results section. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical attack-success and accuracy measurements

full rationale

The paper reports measured attack success rates (31%→10% RAG, 42%→25% data-field) and accuracy lift (72.5%→86.5%) obtained by running 800 generations and 500 fixed attack scenarios on five named LLMs. No equations, fitted parameters, predictions, or self-citations appear in the abstract or described methodology; the outcomes are direct experimental counts on defined test sets rather than quantities derived from the inputs by construction. The evaluation therefore contains no load-bearing step that reduces to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work is an empirical comparison study applying an existing framework; the abstract introduces no new free parameters, mathematical axioms, or postulated entities.

pith-pipeline@v0.9.1-grok · 5813 in / 1221 out tokens · 46135 ms · 2026-06-30T10:14:15.272466+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 19 canonical work pages · 4 internal anchors

[1]

A research landscape of agentic ai and large language models: Applications, challenges and future directions,

S. Brohi, Q. ul-ain Mastoi, N. Z. Jhanjhi, and T. R. Pillai, “A research landscape of agentic ai and large language models: Applications, challenges and future directions,”Algorithms, vol. 18, 2025. [Online]. Available: https://www.mdpi.com/1999-4893/18/8/499

2025
[2]

Is vibe coding safe? benchmarking vulnerability of agent-generated code in real-world tasks,

S. Zhao, D. Wang, K. Zhang, J. Luo, Z. Li, and L. Li, “Is vibe coding safe? benchmarking vulnerability of agent-generated code in real-world tasks,” 2026. [Online]. Available: https://arxiv.org/abs/2512.03262

work page arXiv 2026
[3]

Ai agents under threat: A survey of key security challenges and future pathways,

Z. Deng, Y . Guo, C. Han, W. Ma, J. Xiong, S. Wen, and Y . Xiang, “Ai agents under threat: A survey of key security challenges and future pathways,”ACM Computing Surveys, vol. 57, 2025

2025
[4]

Prompt Injection attack against LLM-integrated Applications

Y . Liu, G. Deng, Y . Li, K. Wang, Z. Wang, X. Wang, T. Zhang, Y . Liu, H. Wang, Y . Zheng, L. Y . Zhang, and Y . Liu, “Prompt injection attack against llm-integrated applications,” 2025. [Online]. Available: https://arxiv.org/abs/2306.05499

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

Optimization-based prompt injection attack to llm-as-a-judge,

J. Shi, Z. Yuan, Y . Liu, Y . Huang, P. Zhou, L. Sun, and N. Z. Gong, “Optimization-based prompt injection attack to llm-as-a-judge,” inCCS 2024 - Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security, 2024

2024
[6]

Not what you’ve signed up for: Com- promising real-world llm-integrated applications with indirect prompt injection,

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Com- promising real-world llm-integrated applications with indirect prompt injection,” inAISec 2023 - Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 2023

2023
[7]

What is ai trism?

A. Gomstyn and A. Jonker, “What is ai trism?” 2025. [Online]. Available: https://www.ibm.com/think/topics/ai- trism

2025
[8]

Mpib: A benchmark for medical prompt injection attacks and clinical safety in llms,

J. Lee, H. Jang, and K. S. Choi, “Mpib: A benchmark for medical prompt injection attacks and clinical safety in llms,” 2026. [Online]. Available: https://arxiv.org/abs/2602.06268

work page arXiv 2026
[9]

User inference attacks on llms,

N. Kandpal, K. Pillutla, A. Oprea, P. Kairouz, C. Choquette-Choo, and Z. Xu, “User inference attacks on llms,” inSocially Responsible Language Modelling Research, 2023. [Online]. Available: https://openreview.net/forum?id=R8XHBFJOl5

2023
[10]

Deep leakage from gradients,

L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,” inAdvances in Neural Information Processing Systems, vol. 32, 2019

2019
[11]

Language model inversion,

J. X. Morris, W. Zhao, J. T. Chiu, V . Shmatikov, and A. M. Rush, “Language model inversion,” 2023. [Online]. Available: https://arxiv.org/abs/2311.13647

work page arXiv 2023
[12]

Llm dataset inference: Did you train on my dataset?

P. Maini, H. Jia, N. Papernot, and A. Dziedzic, “Llm dataset inference: Did you train on my dataset?” in Advances in Neural Information Processing Systems, vol. 37, 2024

2024
[13]

Membership inference attacks against synthetic health data,

Z. Zhang, C. Yan, and B. A. Malin, “Membership inference attacks against synthetic health data,”Journal of Biomedical Informatics, vol. 125, 2022

2022
[14]

Evaluation of inference attack models for deep learning on medical data,

M. Wu, X. Zhang, J. Ding, H. Nguyen, R. Yu, M. Pan, and S. T. Wong, “Evaluation of inference attack models for deep learning on medical data,” 2020. [Online]. Available: https://arxiv.org/abs/2011.00177

work page arXiv 2020
[15]

Hidden in plain sight: Exploring chat history tampering in interactive language models,

C. Wei, Y . Zhao, Y . Gong, K. Chen, L. Xiang, and S. Zhu, “Hidden in plain sight: Exploring chat history tampering in interactive language models,” 2024. [Online]. Available: https://arxiv.org/abs/2405.20234

work page arXiv 2024
[16]

Progent: Securing AI Agents with Privilege Control

T. Shi, J. He, Z. Wang, H. Li, L. Wu, W. Guo, and D. Song, “Progent: Programmable privilege control for llm agents,” 2025. [Online]. Available: https://arxiv.org/abs/2504.11703

work page internal anchor Pith review Pith/arXiv arXiv 2025
[17]

Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,

Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,” inAdvances in Neural Information Processing Systems, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, Eds., vol. 37. Curran Associates, Inc., 2024, pp. 130 185–130 213. [Online]. Available: https:...

2024
[18]

Memory poisoning and secure multi-agent systems,

V . Torra and M. Bras-Amor´os, “Memory poisoning and secure multi-agent systems,” 2026. [Online]. Available: https://arxiv.org/abs/2603.20357

work page arXiv 2026
[19]

Securing ai agents against prompt injection attacks,

B. Ramakrishnan and A. Balaji, “Securing ai agents against prompt injection attacks,” 2025. [Online]. Available: https://arxiv.org/abs/2511.15759

work page arXiv 2025
[20]

Security threats in agentic ai system,

R. Khan, S. Sarkar, S. K. Mahata, and E. Jose, “Security threats in agentic ai system,” 2024. [Online]. Available: https://arxiv.org/abs/2410.14728

work page arXiv 2024
[21]

Commercial llm agents are already vulnerable to simple yet dangerous attacks

A. Li, Y . Zhou, V . C. Raghuram, T. Goldstein, and M. Goldblum, “Commercial llm agents are already vulnerable to simple yet dangerous attacks,” 2025. [Online]. Available: https://arxiv.org/abs/2502.08586 14 Empirical Security Gains from TRiSM-Guided Agentic Workflows in Healthcare

work page arXiv 2025
[22]

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

X. Hou, Y . Zhao, S. Wang, and H. Wang, “Model context protocol (mcp): Landscape, security threats, and future research directions,” 2025. [Online]. Available: https://arxiv.org/abs/2503.23278

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

Securing the model context protocol (mcp): Risks, controls, and governance,

H. Errico, J. Ngiam, and S. Sojan, “Securing the model context protocol (mcp): Risks, controls, and governance,” 2025. [Online]. Available: https://arxiv.org/abs/2511.20920

work page arXiv 2025
[24]

The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems

L. Staufer, K. Feng, K. Wei, L. Bailey, Y . Duan, M. Yang, A. P. Ozisik, S. Casper, and N. Kolt, “The 2025 ai agent index: Documenting technical and safety features of deployed agentic ai systems,” 2026. [Online]. Available: https://arxiv.org/abs/2602.17753

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

Integrating artificial intelligence into public administration: Challenges and vulnerabilities,

A. F. Vatamanu and M. Tofan, “Integrating artificial intelligence into public administration: Challenges and vulnerabilities,”Administrative Sciences, vol. 15, 2025. [Online]. Available: https://www.mdpi.com/2076- 3387/15/4/149

2025
[26]

What is vibe coding and when should you use it (or not)?

M. Horvat, “What is vibe coding and when should you use it (or not)?”TechRxiv, vol. 2025, 2025. [Online]. Available: https://www.techrxiv.org/doi/abs/10.36227/techrxiv.175459780.03758839/v1

work page doi:10.36227/techrxiv.175459780.03758839/v1 2025
[27]

Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems,

S. Raza, R. Sapkota, M. Karkee, and C. Emmanouilidis, “Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems,”AI Open, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2666651026000069

2026
[28]

Operationalizing data minimization for privacy-preserving llm prompting,

J. Zhou, N. Mireshghallah, and T. Li, “Operationalizing data minimization for privacy-preserving llm prompting,” 2025. [Online]. Available: https://arxiv.org/abs/2510.03662

work page arXiv 2025
[29]

Artificial intelligence trust, risk and security management (ai trism): Frameworks, applications, challenges and future research directions,

A. Habbal, M. K. Ali, and M. A. Abuzaraida, “Artificial intelligence trust, risk and security management (ai trism): Frameworks, applications, challenges and future research directions,”Expert Systems with Applications, vol. 240, p. 122442, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0957417423029445

2024
[30]

Ai tips 2.0: A comprehensive framework for operationalizing ai governance,

P. Gupta, “Ai tips 2.0: A comprehensive framework for operationalizing ai governance,” 2026. [Online]. Available: https://arxiv.org/abs/2512.09114

work page arXiv 2026
[31]

A review of trism frameworks in artificial intelligence systems: Fundamentals, taxonomy, use cases, key challenges and future directions,

P. P. Ray, “A review of trism frameworks in artificial intelligence systems: Fundamentals, taxonomy, use cases, key challenges and future directions,”Expert Systems, vol. 43, p. e70213, 2026, e70213 EXSY-Jun-25-2337. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/exsy.70213

work page doi:10.1111/exsy.70213 2026
[32]

Ai agents vs. agentic ai: A conceptual taxonomy, applications and challenges,

R. Sapkota, K. I. Roumeliotis, and M. Karkee, “Ai agents vs. agentic ai: A conceptual taxonomy, applications and challenges,”Information Fusion, vol. 126, p. 103599, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1566253525006712

2026
[33]

Analysing environmental efficiency in ai for x-ray diagnosis,

L. Kearns, “Analysing environmental efficiency in ai for x-ray diagnosis,”Journal of AI, pp. 37–55, 3 2026. [Online]. Available: https://doi.org/10.61969/jai.1838517

work page doi:10.61969/jai.1838517 2026
[34]

Your rag is unfair: Exposing fairness vulnerabilities in retrieval-augmented generation via backdoor attacks,

G. Bagwe, S. S. Chaturvedi, X. Ma, X. Yuan, K.-C. Wang, and L. E. Zhang, “Your rag is unfair: Exposing fairness vulnerabilities in retrieval-augmented generation via backdoor attacks,” 2025. 15

2025

[1] [1]

A research landscape of agentic ai and large language models: Applications, challenges and future directions,

S. Brohi, Q. ul-ain Mastoi, N. Z. Jhanjhi, and T. R. Pillai, “A research landscape of agentic ai and large language models: Applications, challenges and future directions,”Algorithms, vol. 18, 2025. [Online]. Available: https://www.mdpi.com/1999-4893/18/8/499

2025

[2] [2]

Is vibe coding safe? benchmarking vulnerability of agent-generated code in real-world tasks,

S. Zhao, D. Wang, K. Zhang, J. Luo, Z. Li, and L. Li, “Is vibe coding safe? benchmarking vulnerability of agent-generated code in real-world tasks,” 2026. [Online]. Available: https://arxiv.org/abs/2512.03262

work page arXiv 2026

[3] [3]

Ai agents under threat: A survey of key security challenges and future pathways,

Z. Deng, Y . Guo, C. Han, W. Ma, J. Xiong, S. Wen, and Y . Xiang, “Ai agents under threat: A survey of key security challenges and future pathways,”ACM Computing Surveys, vol. 57, 2025

2025

[4] [4]

Prompt Injection attack against LLM-integrated Applications

Y . Liu, G. Deng, Y . Li, K. Wang, Z. Wang, X. Wang, T. Zhang, Y . Liu, H. Wang, Y . Zheng, L. Y . Zhang, and Y . Liu, “Prompt injection attack against llm-integrated applications,” 2025. [Online]. Available: https://arxiv.org/abs/2306.05499

work page internal anchor Pith review Pith/arXiv arXiv 2025

[5] [5]

Optimization-based prompt injection attack to llm-as-a-judge,

J. Shi, Z. Yuan, Y . Liu, Y . Huang, P. Zhou, L. Sun, and N. Z. Gong, “Optimization-based prompt injection attack to llm-as-a-judge,” inCCS 2024 - Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security, 2024

2024

[6] [6]

Not what you’ve signed up for: Com- promising real-world llm-integrated applications with indirect prompt injection,

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Com- promising real-world llm-integrated applications with indirect prompt injection,” inAISec 2023 - Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 2023

2023

[7] [7]

What is ai trism?

A. Gomstyn and A. Jonker, “What is ai trism?” 2025. [Online]. Available: https://www.ibm.com/think/topics/ai- trism

2025

[8] [8]

Mpib: A benchmark for medical prompt injection attacks and clinical safety in llms,

J. Lee, H. Jang, and K. S. Choi, “Mpib: A benchmark for medical prompt injection attacks and clinical safety in llms,” 2026. [Online]. Available: https://arxiv.org/abs/2602.06268

work page arXiv 2026

[9] [9]

User inference attacks on llms,

N. Kandpal, K. Pillutla, A. Oprea, P. Kairouz, C. Choquette-Choo, and Z. Xu, “User inference attacks on llms,” inSocially Responsible Language Modelling Research, 2023. [Online]. Available: https://openreview.net/forum?id=R8XHBFJOl5

2023

[10] [10]

Deep leakage from gradients,

L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,” inAdvances in Neural Information Processing Systems, vol. 32, 2019

2019

[11] [11]

Language model inversion,

J. X. Morris, W. Zhao, J. T. Chiu, V . Shmatikov, and A. M. Rush, “Language model inversion,” 2023. [Online]. Available: https://arxiv.org/abs/2311.13647

work page arXiv 2023

[12] [12]

Llm dataset inference: Did you train on my dataset?

P. Maini, H. Jia, N. Papernot, and A. Dziedzic, “Llm dataset inference: Did you train on my dataset?” in Advances in Neural Information Processing Systems, vol. 37, 2024

2024

[13] [13]

Membership inference attacks against synthetic health data,

Z. Zhang, C. Yan, and B. A. Malin, “Membership inference attacks against synthetic health data,”Journal of Biomedical Informatics, vol. 125, 2022

2022

[14] [14]

Evaluation of inference attack models for deep learning on medical data,

M. Wu, X. Zhang, J. Ding, H. Nguyen, R. Yu, M. Pan, and S. T. Wong, “Evaluation of inference attack models for deep learning on medical data,” 2020. [Online]. Available: https://arxiv.org/abs/2011.00177

work page arXiv 2020

[15] [15]

Hidden in plain sight: Exploring chat history tampering in interactive language models,

C. Wei, Y . Zhao, Y . Gong, K. Chen, L. Xiang, and S. Zhu, “Hidden in plain sight: Exploring chat history tampering in interactive language models,” 2024. [Online]. Available: https://arxiv.org/abs/2405.20234

work page arXiv 2024

[16] [16]

Progent: Securing AI Agents with Privilege Control

T. Shi, J. He, Z. Wang, H. Li, L. Wu, W. Guo, and D. Song, “Progent: Programmable privilege control for llm agents,” 2025. [Online]. Available: https://arxiv.org/abs/2504.11703

work page internal anchor Pith review Pith/arXiv arXiv 2025

[17] [17]

Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,

Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,” inAdvances in Neural Information Processing Systems, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, Eds., vol. 37. Curran Associates, Inc., 2024, pp. 130 185–130 213. [Online]. Available: https:...

2024

[18] [18]

Memory poisoning and secure multi-agent systems,

V . Torra and M. Bras-Amor´os, “Memory poisoning and secure multi-agent systems,” 2026. [Online]. Available: https://arxiv.org/abs/2603.20357

work page arXiv 2026

[19] [19]

Securing ai agents against prompt injection attacks,

B. Ramakrishnan and A. Balaji, “Securing ai agents against prompt injection attacks,” 2025. [Online]. Available: https://arxiv.org/abs/2511.15759

work page arXiv 2025

[20] [20]

Security threats in agentic ai system,

R. Khan, S. Sarkar, S. K. Mahata, and E. Jose, “Security threats in agentic ai system,” 2024. [Online]. Available: https://arxiv.org/abs/2410.14728

work page arXiv 2024

[21] [21]

Commercial llm agents are already vulnerable to simple yet dangerous attacks

A. Li, Y . Zhou, V . C. Raghuram, T. Goldstein, and M. Goldblum, “Commercial llm agents are already vulnerable to simple yet dangerous attacks,” 2025. [Online]. Available: https://arxiv.org/abs/2502.08586 14 Empirical Security Gains from TRiSM-Guided Agentic Workflows in Healthcare

work page arXiv 2025

[22] [22]

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

X. Hou, Y . Zhao, S. Wang, and H. Wang, “Model context protocol (mcp): Landscape, security threats, and future research directions,” 2025. [Online]. Available: https://arxiv.org/abs/2503.23278

work page internal anchor Pith review Pith/arXiv arXiv 2025

[23] [23]

Securing the model context protocol (mcp): Risks, controls, and governance,

H. Errico, J. Ngiam, and S. Sojan, “Securing the model context protocol (mcp): Risks, controls, and governance,” 2025. [Online]. Available: https://arxiv.org/abs/2511.20920

work page arXiv 2025

[24] [24]

The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems

L. Staufer, K. Feng, K. Wei, L. Bailey, Y . Duan, M. Yang, A. P. Ozisik, S. Casper, and N. Kolt, “The 2025 ai agent index: Documenting technical and safety features of deployed agentic ai systems,” 2026. [Online]. Available: https://arxiv.org/abs/2602.17753

work page internal anchor Pith review Pith/arXiv arXiv 2025

[25] [25]

Integrating artificial intelligence into public administration: Challenges and vulnerabilities,

A. F. Vatamanu and M. Tofan, “Integrating artificial intelligence into public administration: Challenges and vulnerabilities,”Administrative Sciences, vol. 15, 2025. [Online]. Available: https://www.mdpi.com/2076- 3387/15/4/149

2025

[26] [26]

What is vibe coding and when should you use it (or not)?

M. Horvat, “What is vibe coding and when should you use it (or not)?”TechRxiv, vol. 2025, 2025. [Online]. Available: https://www.techrxiv.org/doi/abs/10.36227/techrxiv.175459780.03758839/v1

work page doi:10.36227/techrxiv.175459780.03758839/v1 2025

[27] [27]

Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems,

S. Raza, R. Sapkota, M. Karkee, and C. Emmanouilidis, “Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems,”AI Open, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2666651026000069

2026

[28] [28]

Operationalizing data minimization for privacy-preserving llm prompting,

J. Zhou, N. Mireshghallah, and T. Li, “Operationalizing data minimization for privacy-preserving llm prompting,” 2025. [Online]. Available: https://arxiv.org/abs/2510.03662

work page arXiv 2025

[29] [29]

Artificial intelligence trust, risk and security management (ai trism): Frameworks, applications, challenges and future research directions,

A. Habbal, M. K. Ali, and M. A. Abuzaraida, “Artificial intelligence trust, risk and security management (ai trism): Frameworks, applications, challenges and future research directions,”Expert Systems with Applications, vol. 240, p. 122442, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0957417423029445

2024

[30] [30]

Ai tips 2.0: A comprehensive framework for operationalizing ai governance,

P. Gupta, “Ai tips 2.0: A comprehensive framework for operationalizing ai governance,” 2026. [Online]. Available: https://arxiv.org/abs/2512.09114

work page arXiv 2026

[31] [31]

A review of trism frameworks in artificial intelligence systems: Fundamentals, taxonomy, use cases, key challenges and future directions,

P. P. Ray, “A review of trism frameworks in artificial intelligence systems: Fundamentals, taxonomy, use cases, key challenges and future directions,”Expert Systems, vol. 43, p. e70213, 2026, e70213 EXSY-Jun-25-2337. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/exsy.70213

work page doi:10.1111/exsy.70213 2026

[32] [32]

Ai agents vs. agentic ai: A conceptual taxonomy, applications and challenges,

R. Sapkota, K. I. Roumeliotis, and M. Karkee, “Ai agents vs. agentic ai: A conceptual taxonomy, applications and challenges,”Information Fusion, vol. 126, p. 103599, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1566253525006712

2026

[33] [33]

Analysing environmental efficiency in ai for x-ray diagnosis,

L. Kearns, “Analysing environmental efficiency in ai for x-ray diagnosis,”Journal of AI, pp. 37–55, 3 2026. [Online]. Available: https://doi.org/10.61969/jai.1838517

work page doi:10.61969/jai.1838517 2026

[34] [34]

Your rag is unfair: Exposing fairness vulnerabilities in retrieval-augmented generation via backdoor attacks,

G. Bagwe, S. S. Chaturvedi, X. Ma, X. Yuan, K.-C. Wang, and L. E. Zhang, “Your rag is unfair: Exposing fairness vulnerabilities in retrieval-augmented generation via backdoor attacks,” 2025. 15

2025