pith. sign in

arxiv: 2604.09548 · v1 · submitted 2026-01-16 · 💻 cs.IR · cs.AI

Retrieval-Augmented Large Language Models for Evidence-Informed Guidance on Cannabidiol Use in Older Adults

Pith reviewed 2026-05-16 14:08 UTC · model grok-4.3

classification 💻 cs.IR cs.AI
keywords retrieval-augmented generationlarge language modelscannabidiololder adultshealth educationAI safetyguideline alignmentdrug interactions
0
0 comments X

The pith

Retrieval-augmented large language models deliver more cautious cannabidiol guidance for older adults than standalone models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a retrieval-augmented large language model system designed to offer evidence-based advice on using cannabidiol for older adults dealing with pain, sleep problems, or other issues. It uses a curated set of evidence combined with prompt engineering to generate responses tailored to individual scenarios including comorbidities and medications. Evaluation on 64 diverse cases showed that models with retrieval produced recommendations that were more aligned with safety guidelines, avoiding overconfidence or unsafe suggestions. The best results came from an ensemble of retrieval systems. This approach aims to make AI a safer tool for health education where accurate information is critical.

Core claim

Retrieval-augmented models, especially the ensemble version, consistently generated more cautious and guideline-aligned recommendations on cannabidiol use in older adults across three automated evaluation strategies, outperforming standalone large language models in 64 tested scenarios.

What carries the argument

Retrieval-augmented generation framework that integrates multiple retrieval systems with structured prompts and curated cannabidiol evidence to ensure context-aware and safe outputs.

If this is right

  • Retrieval augmentation leads to safer AI recommendations in health domains involving potential drug interactions.
  • Ensemble approaches combining multiple retrieval methods yield the highest alignment with guidelines.
  • Automated evaluation frameworks can assess AI safety without manual annotation.
  • Such systems can assist older adults and caregivers in understanding appropriate cannabidiol use.
  • The framework is reproducible for testing other AI health applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If validated in clinical settings, this could reduce harm from inaccurate AI health advice on supplements.
  • The technique may apply to guidance on other substances or medications with evolving evidence.
  • Future models could incorporate real-time evidence updates to maintain alignment with latest guidelines.
  • Human-AI hybrid systems might use this retrieval method to support healthcare professionals.

Load-bearing premise

The curated evidence on cannabidiol is complete and up-to-date with current guidelines, and the automated metrics accurately reflect real-world safety and alignment.

What would settle it

Human experts reviewing a sample of the AI outputs and finding that retrieval-augmented responses are not more cautious or are less aligned with guidelines than standalone model outputs.

Figures

Figures reproduced from arXiv: 2604.09548 by Ali Abedi, Charlene H. Chu, Shehroz S. Khan.

Figure 1
Figure 1. Figure 1: The block diagram of (a) a standalone large language model (LLM) receiving a human prompt together with a system prompt that instructs how the LLM should interpret the human input and generate its response, (b) the standard retrieval-augmented generation configuration in which an LLM is equipped with retrieved documents as external resources, and (c) the advanced configuration where two distinct retrieval-… view at source ↗
Figure 2
Figure 2. Figure 2: Boxplots of the generated educational content on CBD (a) dosage in milligrams, (b) dosing frequency per day, (c) titration amount in milligrams, (d) titration interval in days, and (e) maximum daily dose in milligrams across the evaluated LLM and RAG systems [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The mean and standard deviation of the generated values (Mean val and Std val) as well as statistical consensus evaluation showing standardized z-scores (Mean z and Std z) of CBD (a) dosage in milligrams, (b) dosing frequency per day, (c) titration amount in milligrams, (d) titration interval in days, and (e) maximum daily dose in milligrams across the evaluated LLM and RAG systems [PITH_FULL_IMAGE:figure… view at source ↗
Figure 4
Figure 4. Figure 4: Feature-aligned directional evaluation showing the number of aligned (Aln), misaligned (Mis), and neutral (Neu) outputs, along with alignment rates (Aln%) for the generated CBD (a) dosage in milligrams, (b) dosing frequency per day, (c) titration amount in milligrams, (d) titration interval in days, and (e) maximum daily dose in milligrams across the evaluated LLM and RAG systems [PITH_FULL_IMAGE:figures/… view at source ↗
Figure 5
Figure 5. Figure 5: LLM-as-a-judge rubric-based evaluation of model outputs across five quality dimensions and the total score, using (a) GPT 5.1 and (b) Gemini 2.5 Pro as the evaluating models. Discussion [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
read the original abstract

Older adults commonly experience chronic conditions such as pain and sleep disturbances and may consider cannabidiol for symptom management. Safe use requires appropriate dosing, careful titration, and awareness of drug interactions, yet stigma and limited health literacy often limit understanding. Conversational artificial intelligence systems based on large language models and retrieval-augmented generation may support cannabidiol education, but their safety and reliability remain insufficiently evaluated. This study developed a retrieval-augmented large language model framework that combines structured prompt engineering with curated cannabidiol evidence to generate context-aware guidance for older adults, including those with cognitive impairment. We also proposed an automated, annotation-free evaluation framework to benchmark leading standalone and retrieval-augmented models in the absence of standardized benchmarks. Sixty-four diverse user scenarios were generated by varying symptoms, preferences, cognitive status, demographics, comorbidities, medications, cannabis history, and caregiver support. Multiple state-of-the-art models were evaluated, including a novel ensemble retrieval architecture that integrates multiple retrieval systems. Across three automated evaluation strategies, retrieval-augmented models consistently produced more cautious and guideline-aligned recommendations than standalone models, with the ensemble approach performing best. These findings demonstrate that structured retrieval improves the reliability and safety of AI-driven cannabidiol education and provide a reproducible framework for evaluating AI tools used in sensitive health contexts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a retrieval-augmented generation (RAG) framework that combines structured prompt engineering with a curated cannabidiol evidence base to generate context-aware guidance for older adults on CBD use, including those with cognitive impairment. It generates 64 diverse scenarios by varying symptoms, demographics, comorbidities, medications, and other factors, then evaluates leading standalone and RAG models (including a novel ensemble retrieval architecture) using three automated annotation-free strategies. The central claim is that RAG models, especially the ensemble, consistently produce more cautious and guideline-aligned recommendations than standalone models.

Significance. If the automated evaluation strategies can be shown to correlate with expert clinical judgment, the work could supply a reproducible, annotation-free framework for benchmarking AI safety in sensitive health-education domains where dosing errors and drug interactions carry direct risk.

major comments (2)
  1. [Evaluation Framework] The central claim that RAG models (particularly the ensemble) produce more cautious and guideline-aligned output rests entirely on the three automated annotation-free evaluation strategies, yet the manuscript supplies no quantitative metrics, no explicit description of the strategies (e.g., lexical caution markers, retrieval overlap, or prompt-derived heuristics), and no external validation that these proxies correlate with actual clinical safety or fidelity to cannabidiol guidelines. This is load-bearing for the result.
  2. [Evidence Base] The curated cannabidiol evidence base is treated as ground truth for measuring guideline alignment without reported completeness checks, expert curation audit, or assessment of its representativeness of current clinical guidelines.
minor comments (2)
  1. [Abstract] The abstract asserts 'consistent improvements' and 'more cautious' recommendations but reports no specific quantitative metrics, effect sizes, or per-strategy scores to support these statements.
  2. [Scenario Generation] Clarify the precise procedure used to generate the 64 scenarios and confirm that they capture real clinical complexity (e.g., polypharmacy interactions, cognitive impairment effects) rather than surface-level variations.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their thorough review and constructive comments on our manuscript. We have revised the paper to address the major concerns raised and provide detailed responses below.

read point-by-point responses
  1. Referee: [Evaluation Framework] The central claim that RAG models (particularly the ensemble) produce more cautious and guideline-aligned output rests entirely on the three automated annotation-free evaluation strategies, yet the manuscript supplies no quantitative metrics, no explicit description of the strategies (e.g., lexical caution markers, retrieval overlap, or prompt-derived heuristics), and no external validation that these proxies correlate with actual clinical safety or fidelity to cannabidiol guidelines. This is load-bearing for the result.

    Authors: We agree that the original manuscript did not provide sufficient detail on the evaluation framework, which is indeed central to our claims. In the revised manuscript, we have added a new subsection (3.4 Evaluation Strategies) that explicitly describes each of the three automated, annotation-free strategies. This includes: (1) lexical caution scoring based on predefined markers (e.g., frequency of phrases recommending medical consultation or low-dose initiation), with reported quantitative results showing higher caution in RAG models; (2) retrieval-evidence overlap metrics quantifying how closely generated responses align with retrieved documents; and (3) prompt-derived heuristic checks for guideline elements such as interaction warnings. We now include specific quantitative metrics throughout the results section for each model and strategy. Regarding external validation, we acknowledge that demonstrating correlation with expert clinical judgment would strengthen the work but requires a dedicated follow-up study involving clinicians, which is beyond the current scope. We have expanded the limitations section to discuss this and outline plans for future validation. revision: yes

  2. Referee: [Evidence Base] The curated cannabidiol evidence base is treated as ground truth for measuring guideline alignment without reported completeness checks, expert curation audit, or assessment of its representativeness of current clinical guidelines.

    Authors: We thank the referee for pointing this out. The evidence base was assembled from a systematic search of PubMed, Cochrane reviews, and major clinical guidelines (e.g., from the FDA, NIH, and geriatric societies) published through 2023. In the revised version, we have included a detailed description in Section 2.2 and a new Appendix B that reports: completeness checks by topic coverage, the curation process (two authors independently reviewed sources with consensus), and an assessment of representativeness showing alignment with current recommendations on older adults. While we did not conduct a formal external expert audit, we have noted this as a limitation and clarified that the base serves as a representative synthesis rather than exhaustive ground truth. These additions provide greater transparency without altering the core findings. revision: yes

standing simulated objections not resolved
  • Demonstrating direct correlation of the automated evaluation proxies with expert clinical judgments, as this would necessitate a new empirical study with healthcare professionals.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper generates 64 scenarios externally by varying symptoms, demographics, and comorbidities, then evaluates RAG versus standalone models using three automated annotation-free strategies that compare outputs to curated guidelines. No equations or derivations reduce a prediction to a fitted parameter from the same data, no self-definitional loops appear where X is defined via Y and then Y is predicted from X, and no load-bearing self-citations are invoked to force uniqueness or ansatz choices. The central claim that ensemble RAG produces more cautious outputs rests on independent comparison to external guidelines rather than tautological renaming or construction from inputs, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that curated evidence is reliable; no free parameters or new invented entities are introduced beyond standard RAG components.

axioms (1)
  • domain assumption Curated cannabidiol evidence base is accurate and comprehensive for generating safe guidance.
    Invoked when stating that retrieval improves guideline alignment.

pith-pipeline@v0.9.0 · 5537 in / 1127 out tokens · 40899 ms · 2026-05-16T14:08:52.161510+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages · 2 internal anchors

  1. [1]

    Cannabidiol (CBD) Use by Older Adults for Acute and Chronic Pain,

    B. Porter, B. S. Marie, G. Milavetz, and K. Herr, “Cannabidiol (CBD) Use by Older Adults for Acute and Chronic Pain,” J Gerontol Nurs, vol. 47, no. 7, pp. 6 –15, Jul. 2021, doi: 10.3928/00989134-20210610-02

  2. [2]

    Use of cannabidiol in the management of insomnia: a systematic review,

    R. M. Ranum, M. O. Whipple, I. Croghan, B. Bauer, L. L. Toussaint, and A. Vincent, “Use of cannabidiol in the management of insomnia: a systematic review,” Cannabis and Cannabinoid Research, vol. 8, no. 2, pp. 213–229, 2023

  3. [3]

    Cannabidiol in anxiety and sleep: a large case series,

    S. Shannon, N. Lewis, H. Lee, and S. Hughes, “Cannabidiol in anxiety and sleep: a large case series,” The Permanente Journal, vol. 23, pp. 18–041, 2019

  4. [4]

    Use of cannabidiol (CBD) for the treatment of cognitive impairment in psychiatric and neurological illness: A narrative review,

    R. Ortiz, S. Rueda, and P. Di Ciano, “Use of cannabidiol (CBD) for the treatment of cognitive impairment in psychiatric and neurological illness: A narrative review,” Exp Clin Psychopharmacol , vol. 31, no. 5, pp. 978 –988, Oct. 2023, doi: 10.1037/pha0000659

  5. [5]

    Cannabinoids in the management of behavioral, psychological, and motor symptoms of neurocognitive disorders: a mixed studies systematic review,

    A. Bahji et al., “Cannabinoids in the management of behavioral, psychological, and motor symptoms of neurocognitive disorders: a mixed studies systematic review,” Journal of Cannabis Research, vol. 4, no. 1, pp. 1–19, 2022

  6. [6]

    CBD and TH C: do they complement each other like Yin and Yang?,

    S. D. Pennypacker and E. A. Romero -Sandoval, “CBD and TH C: do they complement each other like Yin and Yang?,” Pharmacotherapy: The Journal of Human Pharmacology and Drug Therapy, vol. 40, no. 11, pp. 1152–1165, 2020

  7. [7]

    Original qualitative research Perceptions of cannabis among adults aged 60 years and older in Canada: a qualitative study

    J. Renard, B. Panesar, S. Noorbakhsh, E. Wadsworth, N. Cristiano, and R. Gabrys, “Original qualitative research Perceptions of cannabis among adults aged 60 years and older in Canada: a qualitative study”

  8. [8]

    Epidemiology of cannabis use among middle-aged and older adults in the United States,

    O. Livne, M. Stohl, J. Gilman, T. E. Goldberg, M. M. Wall, and D. S. Hasin, “Epidemiology of cannabis use among middle-aged and older adults in the United States,” American Journal of Preventive Medicine, p. 108149, 2025

  9. [9]

    Navigating cannabinoid choices for chronic neuropathic pain in older adults: potholes and highlights,

    R. Kamrul, D. Bunka, A. Crawley, B. Schuster, and M. LeBras, “Navigating cannabinoid choices for chronic neuropathic pain in older adults: potholes and highlights,” Canadian Family Physician, vol. 65, no. 11, pp. 807–811, 2019

  10. [10]

    Clearing the Smoke on Cannabis: Medical Use of Cannabis and Cannabinoids (2024 Update),

    C. C. on S. Use and Addiction, “Clearing the Smoke on Cannabis: Medical Use of Cannabis and Cannabinoids (2024 Update),” Canadian Centre on Substance Use and Addiction, Ottawa, Canada, 2024. [Online]. Available: https://www.ccsa.ca/sites/default/files/2024-04/Clearing-the-Smoke-on-Cannabis- Medical-Use-of-Cannabis-and-Cannabinoids-2024-Update-en.pdf

  11. [11]

    Mental health and cognition in older cannabis users: a review,

    B. E. Vacaflor, O. Beauchet, G. E. Jarvis, A. Schavietto, and S. Rej, “Mental health and cognition in older cannabis users: a review,” Canadian Geriatrics Journal, vol. 23, no. 3, p. 242, 2020

  12. [12]

    Cannabis Use Among Older Adults,

    V. Pravosud et al., “Cannabis Use Among Older Adults,” JAMA Network Open, vol. 8, no. 5, pp. e2510173–e2510173, 2025

  13. [13]

    Patient information materials in general practices and promotion of health literacy: an observational study of their effectiveness,

    J. Protheroe, E. V. Esta cio, and S. Saidy -Khan, “Patient information materials in general practices and promotion of health literacy: an observational study of their effectiveness,” The British Journal of General Practice , vol. 65, no. 632, p. e192, 2015

  14. [14]

    The Effects of Stigma: Older Persons and Medicinal Cannabis,

    S. Dahlke et al., “The Effects of Stigma: Older Persons and Medicinal Cannabis,” Qual Health Res , vol. 34, no. 8 –9, pp. 717 –731, Jul. 2024, doi: 10.1177/10497323241227419

  15. [15]

    Health literacy and older adults: A sy stematic review,

    A. K. Chesser, N. Keene Woods, K. Smothers, and N. Rogers, “Health literacy and older adults: A sy stematic review,” Gerontology and geriatric medicine , vol. 2, p. 2333721416630492, 2016

  16. [16]

    Readability and comprehensibility of over-the-counter medication labels,

    H. Trivedi, A. Trivedi, and M. F. Hannan, “Readability and comprehensibility of over-the-counter medication labels,” Renal Failure , vol. 36, no. 3, pp. 473 –477, 2014

  17. [17]

    Emergent Abilities of Large Language Models

    J. Wei et al. , “Emergent abilities of large language models,” arXiv preprint arXiv:2206.07682, 2022

  18. [18]

    Survey of hallucination in natural language generation,

    Z. Ji et al. , “Survey of hallucination in natural language generation,” ACM computing surveys, vol. 55, no. 12, pp. 1–38, 2023

  19. [19]

    Enhancing health care communication with large language models —the role, challenges, and future directions,

    C. R. S ubramanian, D. A. Yang, and R. Khanna, “Enhancing health care communication with large language models —the role, challenges, and future directions,” JAMA network open, vol. 7, no. 3, pp. e240347–e240347, 2024

  20. [20]

    Large Language Models for Chatbot Health Advice Studies: A Systematic Review,

    B. Huo et al. , “Large Language Models for Chatbot Health Advice Studies: A Systematic Review,” JAMA Netw Open, vol. 8, no. 2, p. e2457879, Feb. 2025, doi: 10.1001/jamanetworkopen.2024.57879

  21. [21]

    Assessment of the Uti lity of Artificial Intelligence -Based Chatbots in Patient Education: A Systematic Review and Meta -Analysis,

    S. H. Emile, N. Horesh, Z. Garoufalia, R. Gefen, M. Boutros, and S. D. Wexner, “Assessment of the Uti lity of Artificial Intelligence -Based Chatbots in Patient Education: A Systematic Review and Meta -Analysis,” The American SurgeonTM, p. 00031348251367031, 2025

  22. [22]

    Retrieval augmented generation for large language models in healthcare: A systematic review,

    L. M. Amugongo, P. Mascheroni, S. Brooks, S. Doering, and J. Seidel, “Retrieval augmented generation for large language models in healthcare: A systematic review,” PLOS Digit Health , vol. 4, no. 6, p. e0000877, Jun. 2025, doi: 10.1371/journal.pdig.0000877

  23. [23]

    Bridging AI and Healthcare: A Scoping Review of Retrieval -Augmented Generation — Ethics, Bias, Transparency, Improvements, and Applications,

    D. J. Bunnell, M. J. Bondy, L. M. Fromtling, E. Ludeman, and K. Gourab, “Bridging AI and Healthcare: A Scoping Review of Retrieval -Augmented Generation — Ethics, Bias, Transparency, Improvements, and Applications,” medRxiv, pp. 2025– 04, 2025

  24. [24]

    Retrieval -augmented generation for knowledge -intensive nlp tasks,

    P. L ewis et al. , “Retrieval -augmented generation for knowledge -intensive nlp tasks,” Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020

  25. [25]

    [Online]

    OpenAI, “GPT-5,” 2025. [Online]. Available: https://openai.com/index/introducing- gpt-5/

  26. [26]

    Gemini 2.5 Pro,

    Google DeepMind, “Gemini 2.5 Pro,” 2025. [Online]. Available: https://deepmind.google/models/gemini/pro/

  27. [27]

    Claude Sonnet 4.5,

    Anthropic, “Claude Sonnet 4.5,” 2025. [Online]. Available: https://www.anthropic.com/news/claude-sonnet-4-5

  28. [28]

    Retrieval augmented generation for 10 large language models and its generalizability in assessing medical fitness,

    Y. H. Ke et al., “Retrieval augmented generation for 10 large language models and its generalizability in assessing medical fitness,” npj Digital Medicine, vol. 8, no. 1, p. 187, 2025

  29. [29]

    Conversational agents in healthcare: a systematic review,

    L. Laranjo et al. , “Conversational agents in healthcare: a systematic review,” Journal of the American Medical Informatics Association , vol. 25, no. 9, pp. 1248 – 1258, 2018

  30. [30]

    PharmaLLM: A medicine prescriber chatbot exploiting Open -Source large language models,

    A. Azam, Z. Naz, and M. U. G. Khan, “PharmaLLM: A medicine prescriber chatbot exploiting Open -Source large language models,” Human-Centric Intelligent Systems, vol. 4, no. 4, pp. 527–544, 2024

  31. [31]

    Accuracy of a chatbot in answering questions that patients should ask before taking a new medication,

    B. R. Cornelison, B. L. Erstad, and C. Edwards, “Accuracy of a chatbot in answering questions that patients should ask before taking a new medication,” Journal of the American Pharmacists Association, vol. 64, no. 4, p. 102110, 2024

  32. [32]

    Retrieval -Augmented Generation Meets Local Languages for Improved Drug Information Access and Comprehension.,

    A. I. Ismail, B. O. Ibrahim, O. Adekanmbi, and I. Adebara, “Retrieval -Augmented Generation Meets Local Languages for Improved Drug Information Access and Comprehension.,” in Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025), 2025, pp. 108–114

  33. [33]

    Development and evaluation of a lightweight large language model chatbot for medication enquiry,

    K. Elangovan et al., “Development and evaluation of a lightweight large language model chatbot for medication enquiry,” PLOS Digital Health , vol. 4, no. 9, p. e0000961, 2025

  34. [34]

    Mistral 7B

    A. Q. Jiang et al., “Mistral 7B,” arXiv preprint arXiv:2310.06825, 2023

  35. [35]

    Enhanced LLM -supported instructions for medication use through retrieval-augmented generation,

    D. dos R. de Jesus et al., “Enhanced LLM -supported instructions for medication use through retrieval-augmented generation,” Computers in Biology and Medicine, vol. 198, p. 111135, 2025

  36. [36]

    E valuation of a context -aware chatbot using retrieval - augmented generation for answering clinical questions on medication -related osteonecrosis of the jaw,

    D. Steybe et al. , “E valuation of a context -aware chatbot using retrieval - augmented generation for answering clinical questions on medication -related osteonecrosis of the jaw,” Journal of Cranio -Maxillofacial Surgery, vol. 53, no. 4, pp. 355–360, 2025

  37. [37]

    From prompt to platform: an agentic AI workflow for healthcare simulation scenario design,

    F. L. Barra et al., “From prompt to platform: an agentic AI workflow for healthcare simulation scenario design,” Advances in Simulation, vol. 10, no. 1, p. 29, 2025

  38. [38]

    Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation,

    J. Kang, “Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation,” arXiv preprint arXiv:2507.02253, 2025

  39. [39]

    Automated Generation of Test Scenarios for Autonomous Driving Using LLMs,

    A. A. Danso and U. Büker, “Automated Generation of Test Scenarios for Autonomous Driving Using LLMs,” Electronics, vol. 14, no. 16, p. 3177, 2025

  40. [40]

    Windows to their world: the effect of sensory impairments on social engagement and activity time in nursing home residents,

    H. E. Resnick, B. E. Fries, and L. M. Verbrugge, “Windows to their world: the effect of sensory impairments on social engagement and activity time in nursing home residents,” J Gerontol B Psychol Sci Soc Sci , vol. 52, no. 3, pp. S135 -144, May 1997, doi: 10.1093/geronb/52b.3.s135

  41. [41]

    Medical cannabis use among older adults in Canada: self -reported data on types and amount used, and perceived effects,

    S. Tumati, K. L. Lanctôt, R. Wang, A. Li, A. Davis, and N. Herrmann, “Medical cannabis use among older adults in Canada: self -reported data on types and amount used, and perceived effects,” Drugs & aging , vol. 39, no. 2, pp. 153 –163, 2022

  42. [42]

    A conceptual framework for research on subjective cognitive decline in preclinical Alzheimer’s disease,

    F. Jessen et al., “A conceptual framework for research on subjective cognitive decline in preclinical Alzheimer’s disease,” Alzheimer’s & dementia, vol. 10, no. 6, pp. 844–852, 2014

  43. [43]

    Mild cognitive impairment,

    R. C. Petersen, “Mild cognitive impairment,” CONTINUUM: lifelong Learning in Neurology, vol. 22, no. 2, pp. 404–418, 2016

  44. [44]

    Older persons: Definitions and key concepts,

    World Health Organization, “Older persons: Definitions and key concepts,” 2023. [Online]. Available: https://emergency.unhcr.org/protection/persons -risk/older- persons

  45. [45]

    Defining age and older adulthood: NIH Style Guide,

    National Institutes of Health, “Defining age and older adulthood: NIH Style Guide,” 2024. [Online]. Available: https://www.nih.gov/nih-style-guide/age

  46. [46]

    Cannabidiol and liver enzyme level elevations in healthy adults: A randomized clinical trial,

    J. Florian et al., “Cannabidiol and liver enzyme level elevations in healthy adults: A randomized clinical trial,” JAMA Internal Medicine, vol. 185, no. 9, pp. 1070 –1078, 2025

  47. [47]

    High prevalence of comorbidities in older adult patients with type 2 diabetes: a cross -sectional survey,

    R. Hashemi et al., “High prevalence of comorbidities in older adult patients with type 2 diabetes: a cross -sectional survey,” BMC geriatrics , vol. 24, no. 1, p. 873, 2024

  48. [48]

    Exploring The Contours: Navig ating Cannabis Use Among Older Adults,

    Y. M. Shin, M. Moussa, and J. Akwe, “Exploring The Contours: Navig ating Cannabis Use Among Older Adults,” Journal of Brown Hospital Medicine, vol. 3, no. 3, p. 120951, 2024

  49. [49]

    Taking Care of Themselves: Cannabis Use Among Informal Care Partners of Older Adults,

    B. Kaskie et al., “Taking Care of Themselves: Cannabis Use Among Informal Care Partners of Older Adults,” Cannabis and cannabinoid research, 2025

  50. [50]

    Prompt engineering as an important emerging skill for medical professionals: tutorial,

    B. Meskó, “Prompt engineering as an important emerging skill for medical professionals: tutorial,” Journal of medical Internet research , vol. 25, p. e50638, 2023

  51. [51]

    Better zero -shot reasoning with role -play prompting,

    A. Kong et al. , “Better zero -shot reasoning with role -play prompting,” in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024, pp. 4099–4113

  52. [52]

    Schema-guided natural language generation,

    Y. Du et al., “Schema-guided natural language generation,” in Proceedings of the 13th International Conference on Natural Language Generation, 2020, pp. 283–295

  53. [53]

    Chain -of-thought prompting elicits reasoning in large language models,

    J. Wei et al. , “Chain -of-thought prompting elicits reasoning in large language models,” Advances in neural information processing systems , vol. 35, pp. 24824– 24837, 2022

  54. [54]

    Rehabilitation Exercise Quality Assessment and Feedback Generation Using Large Language Models with Prompt Engineering,

    J. Tang, A. Abedi, T. J. Colella, and S. S. Khan, “Rehabilitation Exercise Quality Assessment and Feedback Generation Using Large Language Models with Prompt Engineering,” in International Joint Conference on Artificial Intelligence , Springer, 2025, pp. 60–75

  55. [55]

    RAG in health care: a novel framework for improving communication and decision -making by addressing LLM limitations,

    K. K. Y. Ng, I. Matsuba, and P. C. Zhang, “RAG in health care: a novel framework for improving communication and decision -making by addressing LLM limitations,” Nejm Ai, vol. 2, no. 1, p. AIra2400380, 2025

  56. [56]

    Mistral Medium 3,

    Mistral AI , “Mistral Medium 3,” 2025. [Online]. Available: https://mistral.ai/news/mistral-medium-3/

  57. [57]

    [Online]

    xAI, “Grok 4,” 2025. [Online]. Available: https://x.ai/news/grok-4

  58. [58]

    DeepSeek V3.2 -Exp,

    DeepSeek, “DeepSeek V3.2 -Exp,” 2025. [Online]. Available: https://api - docs.deepseek.com/news/news250929

  59. [59]

    Is temperature the creativity parameter of large language models?,

    M. Peeperkorn, T. Kouwenhoven, D. Brown, and A. Jordanous, “Is temperature the creativity parameter of large language models?,” arXiv preprint arXiv:2405.00492, 2024

  60. [60]

    Challenges and applications of large language models.arXiv preprint arXiv:2307.10169, 2023

    J. Kaddour, J. Harris, M. Mozes, H. Bradley, R. Raileanu, and R . McHardy, “Challenges and applications of large language models,” arXiv preprint arXiv:2307.10169, 2023

  61. [61]

    LangGraph,

    LangChain, “LangGraph,” 2025. [Online]. Available: https://www.langchain.com/langgraph

  62. [62]

    Johnson, M

    J. Johnson, M. Douze, and H. Jégou, “Billion -scale similarity search with GPUs,” IEEE Transactions on Big Data , vol. 7, no. 3, pp. 535 –547, 2021, doi: 10.1109/TBDATA.2019.2921572

  63. [63]

    A content analysis of internet information sources on medical cannabis,

    D. J. Kruger, I. M. Moffet, L. C. Seluk, and L. A. Zammit, “A content analysis of internet information sources on medical cannabis,” Journal of Cannabis Research, vol. 2, no. 1, p. 29, 2020

  64. [64]

    The information-seeking behavior and unmet knowledge needs of older medicinal cannabis consumers in Canada: A qualitative descriptive study,

    J. I. Butler et al., “The information-seeking behavior and unmet knowledge needs of older medicinal cannabis consumers in Canada: A qualitative descriptive study,” Drugs & Aging, vol. 40, no. 5, pp. 427–438, 2023

  65. [65]

    Can-stress: A real-world multimodal dataset for understanding cannabis use, stress, and physiological responses.arXiv preprint arXiv:2503.19935, 2025

    R. R. Azghan et al. , “CAN -STRESS: A Real -World Multimodal Dataset for Understanding Cannabis Use, Stress, and Physiological Responses,” arXiv preprint arXiv:2503.19935, 2025

  66. [66]

    Evaluating large language models and agents in healthcare: key challenges in clinical applications,

    C. Xiaolan, X. Jiayang, L. Shanfu, L. Yexin, H. Mingguang, and S. Danli, “Evaluating large language models and agents in healthcare: key challenges in clinical applications,” Intelligent Medicine, 2025

  67. [67]

    On the influence of an iterative affect annotation approach on inter-observer and self -observer reliability,

    S. K. D’Mello, “On the influence of an iterative affect annotation approach on inter-observer and self -observer reliability,” IEEE Transactions on Affective Computing, vol. 7, no. 2, pp. 136–149, 2015

  68. [68]

    Annotated dataset creation through large language models for non-english medical NLP,

    J. Frei and F. Kramer, “Annotated dataset creation through large language models for non-english medical NLP,” Journal of Biomedical Infor matics, vol. 145, p. 104478, 2023

  69. [69]

    Self -instruct: Aligning language models with self -generated instructions,

    Y. Wang et al. , “Self -instruct: Aligning language models with self -generated instructions,” in Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers), 2023, pp. 13484–13508

  70. [70]

    Polyjuice: Generating counterfactuals for explaining, evaluating, and improving models,

    T. Wu, M. T. Ribeiro, J. Heer, and D. S. Weld, “Polyjuice: Generating counterfactuals for explaining, evaluating, and improving models,” arXiv preprint arXiv:2101.00288, 2021

  71. [71]

    LLM-SR: Scientific Equation Discovery via Programming with Large Language Models

    P. Shojaee, K. Meidani, S. Gupta, A. B. Farimani, and C. K. Reddy, “Llm-sr: Scientific equation discovery via programming with large language models,” arXiv preprint arXiv:2404.18400, 2024

  72. [72]

    Are large language models good annotators?,

    J. Mohta, K. Ak, Y. Xu, and M. Shen, “Are large language models good annotators?,” in Proceedings on, PMLR, 2023, pp. 38–48

  73. [73]

    29.Carion, N.et al.Sam 3: Segment anything with concepts (2025)

    Z. Tan et al. , “Large language models for data annotation and synthesis: A survey,” arXiv preprint arXiv:2402.13446, 2024

  74. [74]

    Cannabis: an emerging treatment for common symptoms in older adults,

    K. H. Yang et al., “Cannabis: an emerging treatment for common symptoms in older adults,” Journal of the American Geriatrics Society, vol. 69, no. 1, pp. 91 –97, 2021

  75. [75]

    Risk factors for cannabis-related mental health harms in older adults: a review,

    A. Hudson and P. Hudson, “Risk factors for cannabis-related mental health harms in older adults: a review,” Clinical Gerontologist, vol. 44, no. 1, pp. 3–15, 2021

  76. [76]

    Evaluating clinical AI summaries with large language models as judges,

    E. Croxford et al., “Evaluating clinical AI summaries with large language models as judges,” npj Digital Medicine, vol. 8, no. 1, p. 640, 2025

  77. [77]

    Alignbench: Benchmarking chinese alignment of large language models,

    X. Liu et al. , “Alignbench: Benchmarking chinese alignment of large language models,” in Proceedings of the 62nd Annual Meeting of the A ssociation for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 11621–11640

  78. [78]

    LiveBench

    “LiveBench.” [Online]. Available: https://livebench.ai/

  79. [79]

    Self -rag: Learning to retrieve, generate, and critique through self-reflection,

    A. Asai, Z. Wu, Y. Wang, A. Sil, and H. Hajishirzi, “Self -rag: Learning to retrieve, generate, and critique through self-reflection,” 2024

  80. [80]

    Corrective retrieval augmented generation,

    S.-Q. Yan, J. -C. Gu, Y. Zhu, and Z. -H. Ling, “Corrective retrieval augmented generation,” 2024