pith. sign in

arxiv: 2605.16269 · v1 · pith:JHFIEO2Wnew · submitted 2026-03-31 · 💻 cs.HC · cs.AI

Train the Trainers -- An Agentic AI Framework for Peer-Based Mental Health Support in Battlefield Environments

Pith reviewed 2026-05-21 10:13 UTC · model grok-4.3

classification 💻 cs.HC cs.AI
keywords peer mental health supportagentic AImilitary battlefield environmentsair-gapped systemssymptom triageconsensus decision supportrecovered soldier facilitatorsoperational mental health
0
0 comments X

The pith

Recovered soldiers trained as peer facilitators and supported by consensus-driven AI agents can provide first-line mental health care in forward battlefield settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes training soldiers who have recovered from therapy to act as peer facilitators delivering initial psychological support in operational environments where professional access is limited. These facilitators are augmented by an agentic AI platform that handles symptom triage, guided interventions, operational reasoning, simulation training, and documentation while maintaining human oversight. The system is designed to function in air-gapped and low-connectivity conditions through consensus-driven decision support. If the approach works, it would enable earlier on-site intervention, lower the rate of symptom escalation, decrease unnecessary evacuations to rear facilities, and support better continuity of care under resource constraints.

Core claim

The framework centers on recovered soldiers serving as human supervisors who coordinate specialized AI agents for symptom triage, guided peer-support interventions, operational constraint reasoning, training and simulation, and structured documentation for clinical escalation. This combination of peer-based intervention with consensus-driven agentic AI decision support operates in austere air-gapped environments while preserving ethical safeguards and human oversight.

What carries the argument

The agentic AI-enabled platform with consensus-driven decision support, where the recovered soldier acts as human supervisor coordinating AI agents for triage, interventions, and documentation in air-gapped settings.

If this is right

  • Mental health response times would shorten because support occurs on site rather than after evacuation.
  • Early peer intervention would prevent escalation of acute stress reactions and post-traumatic symptoms.
  • Unnecessary evacuations would decline, preserving unit strength and reducing long-term care burdens.
  • Continuity of care would improve through standardized documentation and structured handoff to professionals when needed.
  • The model would scale under severe connectivity and resource limits while keeping human oversight intact.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same peer-plus-AI structure could extend to civilian disaster response teams facing delayed professional access.
  • Virtual simulation environments could test failure modes of the consensus agents before any live deployment.
  • Integration challenges with existing military health records would need separate privacy-preserving protocols.
  • Tracking long-term PTSD incidence in participating units could reveal whether earlier intervention alters overall recovery trajectories.

Load-bearing premise

Recovered soldiers can be trained as reliable peer facilitators and the AI agents can deliver safe effective triage and interventions in air-gapped high-stakes settings without increasing clinical risk.

What would settle it

A field trial comparing mental health outcomes and evacuation rates in units using the trained peer plus AI system versus standard protocols that would show higher error rates or worse symptom control in the AI-supported groups.

Figures

Figures reproduced from arXiv: 2605.16269 by Abdul Rahman, Amin Hass, Anita H. Clayton, Atmaram Yarlagadda, Christopher K. Rhea, Eranga Bandara, Preston Samuel, Ravi Mukkamala, Ross Gore, Sachin Shetty, Xueping Liang.

Figure 1
Figure 1. Figure 1: Comparison between direct human–LLM interaction and agentic AI–LLM in [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Human-supervised agentic AI architecture for the Train-the-Trainers frame [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the supervised fine-tuning pipeline used to adapt a base LLM to [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Consensus-driven LLM consortium and reasoning workflow used by the agentic [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Structure of the psychiatric assessment fine-tuning dataset. Each sample con [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Training metrics during fine-tuning: learning rate schedule (left) showing [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Validation metrics during fine-tuning: evaluation loss (left) decreasing from 1.8 [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Training progression metrics: epoch count (left) showing linear progression [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Computational efficiency metrics: total floating-point operations (left) reaching [PITH_FULL_IMAGE:figures/full_fig_p027_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Inference performance metrics during validation: samples per second (left) [PITH_FULL_IMAGE:figures/full_fig_p028_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Representative assessment output of the fine-tuned Llama-3 LLM aligned with [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Representative assessment output of the fine-tuned Mistral LLM aligned with [PITH_FULL_IMAGE:figures/full_fig_p030_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Representative assessment output of the fine-tuned Qwen2 LLM aligned with [PITH_FULL_IMAGE:figures/full_fig_p031_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Consensus-driven assessment synthesis produced by the reasoning-oriented [PITH_FULL_IMAGE:figures/full_fig_p033_14.png] view at source ↗
read the original abstract

Modern military operations expose soldiers to sustained psychological stress, leading to acute reactions, post-traumatic stress symptoms, and other mental health issues. Although the U.S. Department of Defense offers evidence-based therapies, access to trained professionals in forward-deployed and contested environments is limited. As a result, soldiers with early-stage distress are often evacuated to rear medical facilities, delaying care, reducing readiness, and increasing long-term risks. This paper proposes a Train-the-Trainers framework in which soldiers who have completed therapy and returned to duty are trained as peer facilitators to provide first-line psychological support in operational settings. To scale and standardize this model under severe resource and connectivity constraints, we introduce an agentic AI-enabled platform that augments these recovered soldiers with specialized AI agents. The recovered soldier acts as a human supervisor, coordinating agents for symptom triage, guided peer-support interventions, operational constraint reasoning, training and simulation, and structured documentation for clinical escalation when needed. The AI agents use consensus-driven decision support in high-stakes environments. The architecture functions in air-gapped and low-connectivity settings, maintaining human oversight and ethical safeguards. A functional prototype was developed with the McDonald U.S. Army Health Center, Newport News, VA, USA. By combining peer-based intervention with consensus-driven agentic AI decision support, the framework seeks to cut response times, prevent symptom escalation, reduce unnecessary evacuations, and improve continuity of care. This work shows how agentic AI can serve as a force multiplier for mental health support in austere environments and identifies pathways for broader evaluation and deployment across defense and humanitarian operations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes a 'Train the Trainers' framework in which recovered soldiers are trained as peer facilitators for first-line mental health support in forward-deployed and air-gapped military environments. This is augmented by an agentic AI platform whose specialized agents perform consensus-driven symptom triage, guided interventions, operational constraint reasoning, simulation-based training, and structured documentation for escalation. A functional prototype was developed in collaboration with the McDonald U.S. Army Health Center; the stated goals are reduced response times, prevention of symptom escalation, fewer unnecessary evacuations, and improved continuity of care.

Significance. If the framework can be shown to operate safely and effectively, it would address a genuine operational gap in austere military settings where licensed clinicians are unavailable. The combination of peer facilitation with consensus-driven agentic AI under explicit human oversight is a novel direction for HCI and AI-for-health research in constrained, high-stakes domains.

major comments (1)
  1. [Abstract / Prototype description] Abstract and prototype description: the manuscript states that a functional prototype was developed with the McDonald U.S. Army Health Center yet supplies no performance data, accuracy figures, escalation rates, safety metrics, or comparative outcomes against unaugmented peer support. Because the central claims (reduced clinical risk, reliable triage, and operational feasibility in air-gapped high-stakes settings) rest on these untested assumptions, the absence of any evaluation is load-bearing.
minor comments (2)
  1. [Framework Architecture] The roles and interaction protocol among the five classes of AI agents (triage, intervention, constraint reasoning, training, documentation) are described at a high level; a concrete specification or pseudocode of the consensus mechanism and the human-supervisor override interface would improve technical clarity.
  2. [Related Work / Introduction] The paper would benefit from explicit references to existing military peer-support programs (e.g., Combat Operational Stress Control) and prior HCI work on AI-assisted mental-health triage so that the incremental contribution is positioned more precisely.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and for acknowledging the potential significance of this framework for addressing mental health support gaps in austere military environments. We agree that the manuscript would benefit from greater clarity regarding the prototype's current status and the scope of its claims. We respond to the major comment below and will make corresponding revisions.

read point-by-point responses
  1. Referee: [Abstract / Prototype description] Abstract and prototype description: the manuscript states that a functional prototype was developed with the McDonald U.S. Army Health Center yet supplies no performance data, accuracy figures, escalation rates, safety metrics, or comparative outcomes against unaugmented peer support. Because the central claims (reduced clinical risk, reliable triage, and operational feasibility in air-gapped high-stakes settings) rest on these untested assumptions, the absence of any evaluation is load-bearing.

    Authors: We appreciate this point and agree that the absence of quantitative evaluation data requires explicit acknowledgment. The manuscript presents a design contribution: a Train-the-Trainers framework augmented by an agentic AI platform, together with a description of a functional prototype developed in collaboration with the McDonald U.S. Army Health Center to demonstrate technical feasibility in air-gapped settings. The prototype illustrates agent coordination, human oversight mechanisms, and basic operational constraint handling, but no clinical trials, accuracy benchmarks, or outcome metrics have been collected at this stage. The abstract frames the work in terms of goals the framework seeks to achieve rather than proven results. To address the concern, we will revise the abstract to more precisely describe the current scope as a framework proposal and prototype demonstration. We will also add a new 'Limitations and Planned Evaluation' section that states the lack of performance, safety, and comparative data, discusses the preliminary nature of the prototype, and outlines future validation steps including simulation studies, expert review, and controlled evaluations under appropriate ethical and security protocols. These changes will ensure the claims are not overstated. revision: yes

Circularity Check

0 steps flagged

Framework proposal with no derivation chain or fitted predictions

full rationale

The manuscript is a forward-looking architectural proposal for an agentic AI platform supporting peer facilitators in military mental health contexts. It describes roles, consensus-driven agents, air-gapped operation, ethical safeguards, and a prototype built with McDonald U.S. Army Health Center, but contains no equations, parameter fits, predictions of quantitative outcomes, uniqueness theorems, or self-referential derivations. Claims about reduced response times and risk are aspirational statements of intent rather than results derived from inputs by construction. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results appear. The work is self-contained as a design document against external benchmarks and does not reduce any central claim to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central proposal rests on domain assumptions about AI reliability and peer training efficacy rather than new free parameters or invented physical entities.

axioms (2)
  • domain assumption AI agents using consensus-driven decision support can provide safe triage and intervention guidance in high-stakes, low-connectivity environments
    Invoked when describing the agents' roles and the architecture's operation in air-gapped settings.
  • domain assumption Recovered soldiers can be effectively trained as peer facilitators without ongoing professional supervision
    Central to the Train-the-Trainers model described in the abstract.
invented entities (1)
  • Consensus-driven AI agents for symptom triage and guided interventions no independent evidence
    purpose: To augment human peer facilitators with specialized decision support
    Introduced as part of the agentic platform; no independent falsifiable evidence provided in the abstract.

pith-pipeline@v0.9.0 · 5863 in / 1412 out tokens · 37961 ms · 2026-05-21T10:13:10.848227+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 8 internal anchors

  1. [1]

    Cognitive resilience to psychological stress in military personnel,

    A. Flood and R. J. Keegan, “Cognitive resilience to psychological stress in military personnel,”Frontiers in psychology, vol. 13, p. 809003, 2022

  2. [2]

    Preventing psychological trauma in soldiers: The role of operational stress training and psychological debriefing,

    M. Deahl, M. Srinivasan, N. Jones, J. Thomas, C. Neblett, and A. Jolly, “Preventing psychological trauma in soldiers: The role of operational stress training and psychological debriefing,”British Journal of Medical Psychology, vol. 73, no. 1, pp. 77–85, 2000

  3. [3]

    Proof- of-tbi–fine-tuned vision language model consortium and openai-o3 rea- soning llm-based medical diagnosis support system for mild traumatic brain injury (tbi) prediction,

    R. Gore, E. Bandara, S. Shetty, A. E. Musto, P. Rana, A. Valencia- Romero, C. Rhea, L. Tayebi, H. Richter, A. Yarlagaddaet al., “Proof- of-tbi–fine-tuned vision language model consortium and openai-o3 rea- soning llm-based medical diagnosis support system for mild traumatic brain injury (tbi) prediction,”arXiv preprint arXiv:2504.18671, 2025

  4. [4]

    Stress and military performance,

    J. M. Orasanu and P. Backer, “Stress and military performance,” in Stress and human performance. Psychology Press, 2013, pp. 89–125

  5. [5]

    Witnessing acute stress reaction in team members: The moderating effect of peer- based training,

    V. Svetlitzky, M. Farchi, A. B. Yehuda, and A. B. Adler, “Witnessing acute stress reaction in team members: The moderating effect of peer- based training,”The Journal of Nervous and Mental Disease, vol. 208, no. 10, pp. 803–809, 2020. 36

  6. [6]

    Peer-based intervention for acute stress reaction: adap- tations by five militaries,

    A. B. Adler, I. Gutierrez, H. M. Edge, A. Nordstrand, A. Simms, and G. Willmund, “Peer-based intervention for acute stress reaction: adap- tations by five militaries,”BMJ Mil Health, vol. 170, no. 5, pp. 425–429, 2024

  7. [7]

    Agentic ai: Autonomous intelligence for complex goals–a comprehensive survey,

    D. B. Acharya, K. Kuppan, and B. Divya, “Agentic ai: Autonomous intelligence for complex goals–a comprehensive survey,”IEEE Access, 2025

  8. [8]

    Agentic ai for scientific discovery: A survey of progress, challenges, and future directions.arXiv preprint arXiv:2503.08979, 2025

    M. Gridach, J. Nanavati, K. Z. E. Abidine, L. Mendes, and C. Mack, “Agentic ai for scientific discovery: A survey of progress, challenges, and future directions,”arXiv preprint arXiv:2503.08979, 2025

  9. [9]

    A practical guide for designing, developing, and deploying production-grade agentic ai workflows,

    E. Bandara, R. Gore, P. Foytik, S. Shetty, R. Mukkamala, A. Rah- man, X. Liang, S. H. Bouk, A. Hass, S. Rajapakseet al., “A practical guide for designing, developing, and deploying production-grade agentic ai workflows,”arXiv preprint arXiv:2512.08769, 2025

  10. [10]

    Rovanima—scaling up small and medium-sized tourism enter- prises through agentic ai in lapland

    E. Bandaraa, T. Hewab, S. Rajapaksef, I. Kularathnaf, P. Karunarath- nag, R. Gorea, P. Foytika, S. Shettya, R. Mukkamalaa, A. Rahmanc et al., “Rovanima—scaling up small and medium-sized tourism enter- prises through agentic ai in lapland.”

  11. [11]

    Mitigating data leakage in high- compliance environments: A governance framework for local-llm adop- tion,

    J. Alshaer and I. R. Amman, “Mitigating data leakage in high- compliance environments: A governance framework for local-llm adop- tion,” 2026

  12. [12]

    Llm potentiality and awareness: a position paper from the perspective of trustworthy and responsible ai modeling,

    I. H. Sarker, “Llm potentiality and awareness: a position paper from the perspective of trustworthy and responsible ai modeling,”Discover Artificial Intelligence, vol. 4, no. 1, p. 40, 2024

  13. [13]

    Explainable ai (xai): Core ideas, techniques, and solutions,

    R. Dwivedi, D. Dave, H. Naik, S. Singhal, R. Omer, P. Patel, B. Qian, Z. Wen, T. Shah, G. Morganet al., “Explainable ai (xai): Core ideas, techniques, and solutions,”ACM computing surveys, vol. 55, no. 9, pp. 1–33, 2023

  14. [14]

    Towards responsi- ble and explainable ai agents with consensus-driven reasoning,

    E. Bandara, T. Hewa, R. Gore, S. Shetty, R. Mukkamala, P. Foytik, A. Rahman, S. H. Bouk, X. Liang, A. Hasset al., “Towards responsi- ble and explainable ai agents with consensus-driven reasoning,”arXiv preprint arXiv:2512.21699, 2025. 37

  15. [15]

    Language models enable simple systems for gen- erating structured views of heterogeneous data lakes,

    S. Arora, B. Yang, S. Eyuboglu, A. Narayan, A. Hojel, I. Trum- mer, and C. R´ e, “Language models enable simple systems for gen- erating structured views of heterogeneous data lakes,”arXiv preprint arXiv:2304.09433, 2023

  16. [16]

    Language mod- els are few-shot learners,

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language mod- els are few-shot learners,”Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020

  17. [17]

    Llm as a mastermind: A survey of strategic reasoning with large language models,

    Y. Zhang, S. Mao, T. Ge, X. Wang, A. de Wynter, Y. Xia, W. Wu, T. Song, M. Lan, and F. Wei, “Llm as a mastermind: A survey of strategic reasoning with large language models,”arXiv preprint arXiv:2404.01230, 2024

  18. [18]

    arXiv preprint arXiv:2502.10867 , year=

    J. Wang, “A tutorial on llm reasoning: Relevant methods behind chatgpt o1,”arXiv preprint arXiv:2502.10867, 2025

  19. [19]

    Data-efficient fine-tuning for llm-based recommendation,

    X. Lin, W. Wang, Y. Li, S. Yang, F. Feng, Y. Wei, and T.-S. Chua, “Data-efficient fine-tuning for llm-based recommendation,” inProceed- ings of the 47th international ACM SIGIR conference on research and development in information retrieval, 2024, pp. 365–374

  20. [20]

    Responsible generative ai: A com- prehensive study to explain llms,

    I. Shruti, A. Kumar, A. Sethet al., “Responsible generative ai: A com- prehensive study to explain llms,” in2024 International Conference on Electrical, Computer and Energy Technologies (ICECET. IEEE, 2024, pp. 1–6

  21. [21]

    Llm explainability,

    I. Arous, K. Chehbouni, Z. Cheng, and B. Dossou, “Llm explainability,” inHandbook of Human-Centered Artificial Intelligence. Springer, 2025, pp. 1–61

  22. [22]

    Survey on Evaluation of LLM-based Agents

    A. Yehudai, L. Eden, A. Li, G. Uziel, Y. Zhao, R. Bar-Haim, A. Cohan, and M. Shmueli-Scheuer, “Survey on evaluation of llm-based agents,” arXiv preprint arXiv:2503.16416, 2025

  23. [23]

    Agentsway–software development methodology for ai agents- based teams,

    E. Bandara, R. Gore, X. Liang, S. Rajapakse, I. Kularathne, P. Karunarathna, P. Foytik, S. Shetty, R. Mukkamala, A. Rahman et al., “Agentsway–software development methodology for ai agents- based teams,”arXiv preprint arXiv:2510.23664, 2025. 38

  24. [24]

    Post-traumatic psychiatric disorders: Ptsd is not the only diagnosis,

    Y. Aux´ em´ ery, “Post-traumatic psychiatric disorders: Ptsd is not the only diagnosis,”La Presse M´ edicale, vol. 47, no. 5, pp. 423–430, 2018

  25. [25]

    Standardization of psychiatric diagnoses–role of fine-tuned llm consortium and openai-gpt-oss reasoning llm enabled de- cision support system,

    E. Bandara, R. Gore, A. Yarlagadda, A. H. Clayton, P. Samuel, C. K. Rhea, and S. Shetty, “Standardization of psychiatric diagnoses–role of fine-tuned llm consortium and openai-gpt-oss reasoning llm enabled de- cision support system,”arXiv preprint arXiv:2510.25588, 2025

  26. [26]

    Large language models encode clinical knowledge,

    K. Singhal, S. Azizi, T. Tu, S. S. Mahdavi, J. Wei, H. W. Chung, N. Scales, A. Tanwani, H. Cole-Lewis, S. Pfohlet al., “Large language models encode clinical knowledge,”Nature, vol. 620, no. 7972, pp. 172– 180, 2023

  27. [27]

    “it’sa psychiatric patient

    S. Braˇ cko and A. ˇCelofiga, ““it’sa psychiatric patient”: Misdiagnosing of somatic symptoms in patients with mental disorders due to stigma and inadequate diagnostic treatment,”Archives of Psychiatry Research: An International Journal of Psychiatry and Related Sciences, vol. 60, no. 1., pp. 62–66, 2024

  28. [28]

    Towards accu- rate differential diagnosis with large language models,

    D. McDuff, M. Schaekermann, T. Tu, A. Palepu, A. Wang, J. Garrison, K. Singhal, Y. Sharma, S. Azizi, K. Kulkarniet al., “Towards accu- rate differential diagnosis with large language models,”arXiv preprint arXiv:2312.00164, 2023

  29. [29]

    Drhouse: An llm-empowered diagnostic reasoning system through harnessing outcomes from sensor data and expert knowledge,

    B. Yang, S. Jiang, L. Xu, K. Liu, H. Li, G. Xing, H. Chen, X. Jiang, and Z. Yan, “Drhouse: An llm-empowered diagnostic reasoning system through harnessing outcomes from sensor data and expert knowledge,” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 8, no. 4, pp. 1–29, 2024

  30. [30]

    Advancements in machine learning and deep learning for early detection and management of mental health disorder,

    K. D. Kannan, S. K. Jagatheesaperumal, R. N. Kandala, M. Lotfaliany, R. Alizadehsanid, and M. Mohebbi, “Advancements in machine learning and deep learning for early detection and management of mental health disorder,”arXiv preprint arXiv:2412.06147, 2024

  31. [31]

    Mdd-5k: A new diagnostic conversation dataset for mental disorders synthesized via neuro-symbolic llm agents,

    C. Yin, F. Li, S. Zhang, Z. Wang, J. Shao, P. Li, J. Chen, and X. Jiang, “Mdd-5k: A new diagnostic conversation dataset for mental disorders synthesized via neuro-symbolic llm agents,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 24, 2025, pp. 25 715– 25 723. 39

  32. [32]

    Large language models for interpretable men- tal health diagnosis,

    B. H. Kim and C. Wang, “Large language models for interpretable men- tal health diagnosis,”arXiv preprint arXiv:2501.07653, 2025

  33. [33]

    Fine-tuning mistral 7b large language model for python query response and code generation: A parameter efficient approach,

    H. Samo, K. Ali, M. Memon, F. A. Abbasi, M. Y. Koondhar, and K. Dahri, “Fine-tuning mistral 7b large language model for python query response and code generation: A parameter efficient approach,” VAWKUM Transactions on Computer Sciences, vol. 12, no. 1, pp. 205– 217, 2024

  34. [34]

    Bandara, S

    E. Bandara, S. Shetty, R. Mukkamala, A. Rahman, P. Foytik, X. Liang, K. De Zoysa, and N. W. Keong, “Devsec-gpt — generative-ai (with custom-trained meta’s llama2 llm), blockchain, nft and pbom enabled cloud native container vulnerability management and pipeline verifica- tion platform,” in2024 IEEE Cloud Summit, 2024, pp. 28–35

  35. [35]

    The rise of agentic ai: A review of definitions, frameworks, architectures, ap- plications, evaluation metrics, and challenges,

    A. Bandi, B. Kongari, R. Naguru, S. Pasnoor, and S. V. Vilipala, “The rise of agentic ai: A review of definitions, frameworks, architectures, ap- plications, evaluation metrics, and challenges,”Future Internet, vol. 17, no. 9, p. 404, 2025

  36. [36]

    Llama- recipe — fine-tuned meta’s llama llm, pbom and nft enabled 5g network- slice orchestration and end-to-end supply-chain verification platform,

    E. Bandara, S. H. Bouk, S. Shetty, S. Roy, R. Mukkamala, A. Rah- man, P. Foytik, X. Liang, N. W. Keong, and K. De Zoysa, “Llama- recipe — fine-tuned meta’s llama llm, pbom and nft enabled 5g network- slice orchestration and end-to-end supply-chain verification platform,” in2025 IEEE 22nd Consumer Communications & Networking Confer- ence (CCNC), 2025, pp. 1–6

  37. [37]

    A study of lora: Long range & low power networks for the internet of things,

    A. Augustin, J. Yi, T. Clausen, and W. Townsley, “A study of lora: Long range & low power networks for the internet of things,”Sensors, vol. 16, no. 9, p. 1466, 2016

  38. [38]

    Wedagpt—generative-ai (with custom- trained meta’s llama2 llm), blockchain, self sovereign identity, nft and model card enabled indigenous medicine platform,

    E. Bandara, P. Foytik, S. Shetty, R. Mukkamala, A. Rahman, X. Liang, N. W. Keong, and K. De Zoysa, “Wedagpt—generative-ai (with custom- trained meta’s llama2 llm), blockchain, self sovereign identity, nft and model card enabled indigenous medicine platform,” in2024 IEEE Sym- posium on Computers and Communications (ISCC). IEEE, 2024, pp. 1–6

  39. [39]

    Agentic ai systems: Opportunities, chal- lenges, and trustworthiness,

    T. Raheem and G. Hossain, “Agentic ai systems: Opportunities, chal- lenges, and trustworthiness,” in2025 IEEE International Conference on Electro Information Technology (eIT). IEEE, 2025, pp. 618–624. 40

  40. [40]

    Deep-stride: Automated security threat modeling with vision-language models,

    E. Bandara, A. Hass, S. Shetty, R. Mukkamala, R. Gore, A. Rahman, and S. H. Bouk, “Deep-stride: Automated security threat modeling with vision-language models,” in2025 International Conference on Software, Telecommunications and Computer Networks (SoftCOM), 2025, pp. 1–7

  41. [41]

    Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers

    M. M. Hasan, H. Li, E. Fallahzadeh, G. K. Rajbahadur, B. Adams, and A. E. Hassan, “Model context protocol (mcp) at first glance: Study- ing the security and maintainability of mcp servers,”arXiv preprint arXiv:2506.13538, 2025

  42. [42]

    Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

    X. Hou, Y. Zhao, S. Wang, and H. Wang, “Model context protocol (mcp): Landscape, security threats, and future research directions,” arXiv preprint arXiv:2503.23278, 2025

  43. [43]

    Model con- text contracts-mcp-enabled framework to integrate llms with blockchain smart contracts,

    E. Bandara, S. Shetty, R. Mukkamala, R. Gore, P. Foytik, S. H. Bouk, A. Rahman, X. Liang, N. W. Keong, K. De Zoysaet al., “Model con- text contracts-mcp-enabled framework to integrate llms with blockchain smart contracts,”arXiv preprint arXiv:2510.19856, 2025

  44. [44]

    Implementation of an on-device ai chatbot system in an air-gapped control central console environ- ment,

    Y. Lee, S. Lee, J. Choi, and W. Jung, “Implementation of an on-device ai chatbot system in an air-gapped control central console environ- ment,”Journal of the Korea Institute of Military Science and Tech- nology, vol. 28, no. 5, pp. 525–535, 2025

  45. [45]

    LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

    Y. Zheng, R. Zhang, J. Zhang, Y. Ye, Z. Luo, Z. Feng, and Y. Ma, “Llamafactory: Unified efficient fine-tuning of 100+ language models,” arXiv preprint arXiv:2403.13372, 2024

  46. [46]

    Lohan: Low-cost high-performance framework to fine-tune 100b model on a consumer gpu,

    C. Liao, M. Sun, Z. Yang, J. Xie, K. Chen, B. Yuan, F. Wu, and Z. Wang, “Lohan: Low-cost high-performance framework to fine-tune 100b model on a consumer gpu,”arXiv preprint arXiv:2403.06504, 2024

  47. [47]

    Performance comparision of tpu, gpu, cpu on google colaboratory over distributed deep learning,

    H. Kimm, I. Paik, and H. Kimm, “Performance comparision of tpu, gpu, cpu on google colaboratory over distributed deep learning,” in2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC). IEEE, 2021, pp. 312–319

  48. [48]

    Qlora: Efficient finetuning of quantized llms,

    T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “Qlora: Efficient finetuning of quantized llms,”Advances in Neural Information Processing Systems, vol. 36, 2024. 41

  49. [49]

    Vindsec-llama — fine-tuned meta’s llama-3 llm, federated learning, blockchain and pbom-enabled data security architecture for wind en- ergy data platforms,

    E. Bandara, S. H. Bouk, S. Shetty, R. Gore, S. Kompella, R. Mukka- mala, A. Rahman, P. Foytik, X. Liang, N. W. Keong, and K. De Zoysa, “Vindsec-llama — fine-tuned meta’s llama-3 llm, federated learning, blockchain and pbom-enabled data security architecture for wind en- ergy data platforms,” in2025 International Wireless Communications and Mobile Computin...

  50. [50]

    The Llama 3 Herd of Models

    A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Yang, A. Fanet al., “The llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024

  51. [51]

    Pixtral 12B

    P. Agrawal, S. Antoniak, E. B. Hanna, B. Bout, D. Chaplot, J. Chud- novsky, D. Costa, B. De Monicault, S. Garg, T. Gervetet al., “Pixtral 12b,”arXiv preprint arXiv:2410.07073, 2024

  52. [52]

    Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

    P. Wang, S. Bai, S. Tan, S. Wang, Z. Fan, J. Bai, K. Chen, X. Liu, J. Wang, W. Geet al., “Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution,”arXiv preprint arXiv:2409.12191, 2024

  53. [53]

    gpt-oss-120b & gpt-oss-20b Model Card

    S. Agarwal, L. Ahmad, J. Ai, S. Altman, A. Applebaum, E. Arbus, R. K. Arora, Y. Bai, B. Baker, H. Baoet al., “gpt-oss-120b & gpt-oss- 20b model card,”arXiv preprint arXiv:2508.10925, 2025

  54. [54]

    The dsm-5: Classification and criteria changes,

    D. A. Regier, E. A. Kuhl, and D. J. Kupfer, “The dsm-5: Classification and criteria changes,”World psychiatry, vol. 12, no. 2, pp. 92–98, 2013

  55. [55]

    Is factor analysis useful for revising diagnostic criteria for ptsd? a systematic review of five issues ten years after dsm-5,

    M. S. Scheeringa, “Is factor analysis useful for revising diagnostic criteria for ptsd? a systematic review of five issues ten years after dsm-5,” Journal of Psychiatric Research, 2024

  56. [56]

    The impact of screening positive for hazardous alcohol use on the diagnostic accuracy of the ptsd checklist for dsm-5 among veterans,

    R. E. Sistad, R. Kimerling, P. P. Schnurr, and M. J. Bovin, “The impact of screening positive for hazardous alcohol use on the diagnostic accuracy of the ptsd checklist for dsm-5 among veterans,”Journal of Traumatic Stress, vol. 37, no. 2, pp. 328–336, 2024

  57. [57]

    On-device qwen2. 5: Efficient llm inference with model compression and hardware acceleration,

    M. Xiang, R. Fernando, and B. Wang, “On-device qwen2. 5: Efficient llm inference with model compression and hardware acceleration,”arXiv preprint arXiv:2504.17376, 2025. 42

  58. [58]

    Stan- dardization of neuromuscular reflex analysis–role of fine-tuned vision- language model consortium and openai gpt-oss reasoning llm enabled decision support system,

    E. Bandara, R. Gore, S. Shetty, R. Mukkamala, C. Rhea, A. Yarlagadda, S. Kaushik, L. De Silva, A. Maznychenko, I. Sokolowskaet al., “Stan- dardization of neuromuscular reflex analysis–role of fine-tuned vision- language model consortium and openai gpt-oss reasoning llm enabled decision support system,”arXiv preprint arXiv:2508.12473, 2025

  59. [59]

    Enhancing ai systems with agentic workflows patterns in large language model,

    A. Singh, A. Ehtesham, S. Kumar, and T. T. Khoei, “Enhancing ai systems with agentic workflows patterns in large language model,” in 2024 IEEE World AI IoT Congress (AIIoT). IEEE, 2024, pp. 527– 532

  60. [60]

    Me-llama: Medical foundation large lan- guage models for comprehensive text analysis and beyond,

    Q. Xie, Q. Chen, A. Chen, C. Peng, Y. Hu, F. Lin, X. Peng, J. Huang, J. Zhang, V. Kelothet al., “Me-llama: Medical foundation large lan- guage models for comprehensive text analysis and beyond,” 2024. 43