pith. sign in

arxiv: 2605.08480 · v1 · submitted 2026-05-08 · 💻 cs.AI

AI-Care: A Conversational Agentic System for Task Coordination in Alzheimer's Disease Care

Pith reviewed 2026-05-12 02:30 UTC · model grok-4.3

classification 💻 cs.AI
keywords systemadrdindividualsai-carealzheimeragenticcalendarconversational
0
0 comments X

The pith

AI-Care is a LangGraph-based conversational agent that coordinates daily tasks for Alzheimer's patients through natural language with caregiver-grounded safety controls and multi-turn clarification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

People with Alzheimer's disease often find multi-step digital tasks difficult because memory and planning changes turn simple actions like adding a calendar event into barriers. AI-Care acts as a voice assistant that lets users speak or type naturally to handle reminders and lists. The system processes each request through a fixed sequence: it sanitizes input, classifies intent, loads context from caregiver records, runs safety checks, collects any missing details through questions, executes the task via tools, and composes a response. For high-stakes items like medications, answers come only from verified records rather than model generation. Long replies are broken into short chunks for easier listening. The system supports both text and speech with ElevenLabs voice output. In a small pilot, four people with mild-to-moderate Alzheimer's used the system to finish coordination tasks and rated it trustworthy and easy to use. The design avoids autonomous medical decisions and handles unclear requests by asking for clarification instead of guessing.

Core claim

A preliminary pilot with four individuals with mild-to-moderate AD/ADRD showed that users found the system trustworthy, competent, and likable, and were able to complete the evaluated coordination tasks through conversation.

Load-bearing premise

That qualitative feedback from four pilot users in an unspecified setting is sufficient to indicate the system reduces cognitive load and maintains safety for the broader population of people with AD/ADRD.

read the original abstract

Individuals with Alzheimer's disease (AD) and Alzheimer's disease-related dementia (ADRD) experience memory and thinking changes that impact their ability to use digital daily management tools. For example, adding an event to a digital calendar requires multiple steps that may act as barriers to independent use for individuals with AD/ADRD. This paper presents AI-Care, a conversational agentic artificial intelligence (AI) layer built on top of a remote caregiving platform co-designed with people with AD/ADRD. AI-Care is designed to reduce the cognitive load on individuals with AD/ADRD when managing everyday tasks such as setting calendar reminders and organizing to-do lists through natural-language interaction with a voice-first chatbot. The system uses a LangGraph-based stateful orchestration approach in which each request passes through sanitization, intent classification, context loading, safety checks, deterministic slot collection, tool execution, and response composition. Safety-critical responses, particularly around medications and allergies, are grounded in caregiver-verified records rather than free-form model generation. The system does not make autonomous medical or treatment decisions. Incomplete or ambiguous requests are handled through controlled multi-turn clarification rather than silent failure or guessing. The system supports both typed and spoken input, with voice output through ElevenLabs text-to-speech. Longer responses are chunked before synthesis to avoid rushed playback. A preliminary pilot with four individuals with mild-to-moderate AD/ADRD showed that users found the system trustworthy, competent, and likable, and were able to complete the evaluated coordination tasks through conversation. We describe the design goals, system architecture, safety controls, and findings from this formative evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces AI-Care, a LangGraph-based conversational agentic system layered on a remote caregiving platform for individuals with mild-to-moderate Alzheimer's disease and related dementias (AD/ADRD). It details a pipeline for natural-language task coordination (e.g., calendar reminders, to-do lists) that includes sanitization, intent classification, context loading, safety checks grounded in caregiver-verified records, deterministic slot filling, tool execution, and multi-turn clarification. The system supports voice and text input with chunked TTS output and explicitly avoids autonomous medical decisions. A formative pilot with four participants reported that users perceived the system as trustworthy, competent, and likable and successfully completed the evaluated coordination tasks via conversation.

Significance. If the usability and safety claims can be substantiated with quantitative metrics and larger samples, the work would address a genuine barrier in digital self-management tools for AD/ADRD by providing a voice-first, stateful interface with explicit safety grounding. The explicit separation of verified caregiver data from model generation and the controlled clarification strategy are constructive design choices that could inform other assistive agents in high-variability cognitive-impairment domains.

major comments (2)
  1. [§5] §5 (Formative Evaluation / Pilot Results): The central claim that the system reduces cognitive load and supports safe task completion rests on qualitative impressions from n=4 participants; no task-completion rates, error counts, cognitive-load instruments, baseline comparisons, or safety-incident logs are reported, so the extrapolation to the broader mild-to-moderate AD/ADRD population lacks empirical warrant.
  2. [§4] §4 (System Architecture and Safety Controls): While the pipeline description states that safety-critical responses are grounded in caregiver-verified records, the manuscript supplies no quantitative assessment of how often safety checks are invoked, their precision/recall in the pilot, or failure modes when intent classification is uncertain, leaving the safety claim untested at the level required for the target population.
minor comments (2)
  1. [Abstract / §3] The abstract and §3 mention ElevenLabs TTS and LangGraph without providing version numbers, configuration parameters, or rationale for these choices relative to alternatives.
  2. [§5] The pilot description does not specify the exact tasks, number of turns allowed, or criteria used to judge “successful completion,” making the reported positive outcomes difficult to replicate or compare.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed review. We value the recognition of AI-Care's design choices for safety grounding and multi-turn clarification in the AD/ADRD context. The comments correctly identify that the current evaluation is limited in scope and quantitative depth. Below we respond point by point, indicating where we will revise the manuscript to better contextualize the formative pilot while remaining faithful to the data collected.

read point-by-point responses
  1. Referee: [§5] §5 (Formative Evaluation / Pilot Results): The central claim that the system reduces cognitive load and supports safe task completion rests on qualitative impressions from n=4 participants; no task-completion rates, error counts, cognitive-load instruments, baseline comparisons, or safety-incident logs are reported, so the extrapolation to the broader mild-to-moderate AD/ADRD population lacks empirical warrant.

    Authors: We agree that the pilot provides only qualitative impressions from n=4 and does not include standardized instruments, baselines, or quantitative performance logs. The manuscript already labels the study as formative and preliminary; its purpose was to assess initial feasibility, user perceptions of trustworthiness and likability, and the viability of the LangGraph orchestration for natural-language task coordination. All four participants successfully completed the evaluated tasks through conversation. In the revised manuscript we will expand §5 with an explicit limitations subsection that states the small sample, qualitative focus, absence of cognitive-load scales or error-rate logging, and the preliminary nature of any claims about load reduction. We will also add more detail on the specific tasks and observed interaction patterns. We cannot add numerical task-completion rates, error counts, or instrument scores because the approved protocol collected only post-session interviews and observations. revision: partial

  2. Referee: [§4] §4 (System Architecture and Safety Controls): While the pipeline description states that safety-critical responses are grounded in caregiver-verified records, the manuscript supplies no quantitative assessment of how often safety checks are invoked, their precision/recall in the pilot, or failure modes when intent classification is uncertain, leaving the safety claim untested at the level required for the target population.

    Authors: The manuscript describes the deterministic grounding of safety-critical actions in caregiver-verified records and the use of multi-turn clarification for uncertain intents, but it does not report invocation counts, precision/recall, or a systematic failure-mode analysis from the pilot. During the four sessions no safety incidents occurred. In the revision we will augment §4 with a clearer enumeration of the safety pipeline steps, examples of clarification triggers, and an explicit statement that quantitative assessment of the safety layer (invocation frequency, classification performance, failure modes) was not instrumented in this formative study and is planned for subsequent work. These points will also be referenced in the new limitations subsection of §5. revision: partial

standing simulated objections not resolved
  • Quantitative safety metrics (invocation frequency, precision/recall) and standardized performance data (task-completion rates, cognitive-load scores, error counts) from the n=4 pilot, because these were not collected under the formative evaluation protocol.

Circularity Check

0 steps flagged

No circularity: system description and pilot report with no derivations or fitted quantities

full rationale

The paper is a descriptive account of an AI system architecture (LangGraph orchestration, safety checks, voice I/O) plus a qualitative pilot with n=4 users. No equations, parameters, predictions, or uniqueness theorems appear. The pilot findings are reported as formative observations rather than extrapolated claims that reduce to prior inputs. No self-citations are load-bearing on any derivation. The content is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an engineering and pilot study paper with no mathematical models, fitted parameters, background axioms, or new postulated entities; all components draw from standard AI tooling and caregiving platform design.

pith-pipeline@v0.9.0 · 5622 in / 1316 out tokens · 58310 ms · 2026-05-12T02:30:28.422152+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

  1. [1]

    Building for the future: Creating homes and communities for aging well,

    J. Binette and F. Farago, “Building for the future: Creating homes and communities for aging well,” 2024. [2] Alzheimer’s Dementia, vol. 21, no. 4, p. e70235, Apr. 2025. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC12040760/

  2. [2]

    Technology use patterns, preferences, and desires of non-co-residing family members caring for older individuals with memory impairment,

    A. Weakley, R. Park, P. Sangrawiakararat, S. Namboodiri, S. T. Farias, O. Mohammed, B. Brown, M. Meyer, and L. Hinton, “Technology use patterns, preferences, and desires of non-co-residing family members caring for older individuals with memory impairment,” February 2025, presented at the fifty-third International Neuropsychological Society Annual Meeting

  3. [3]

    Grammatikopoulou, I

    M. Grammatikopoulou, I. Lazarou, V. Alepopoulos, L. Mpaltadoros, V. P. Oikonomou, T. G. Stavropoulos, S. Nikolopoulos, I. Kompatsiaris,and M. Tsolaki, “Assessing the cognitive decline of people in the spectrum of AD by monitoring their activities of daily living in an IoT-enabled smart home environment: a cross-sectional pilot study,” Front. Aging Neurosc...

  4. [4]

    Interactive care: a web-based platform for remote caregiving and functional independence in older adults with cognitive impairment,

    A. Weakley, X. Liu, S. Duvvur, H. Kaushal, N. Mussi, S. Namboodiri, Y. Choi, and S. Tomaszewski Farias, “Interactive care: a web-based platform for remote caregiving and functional independence in older adults with cognitive impairment,” Alzheimers Dement., vol. 17, p.e055322, 2021

  5. [5]

    Voice assistive technology for activities of daily living: developing an Alexa telehealth training for adults with cognitive-communication disorders,

    Y. Du, C. O’Connor, G. Byun, L. H. Kim, S. Amrgousian, and P. Vora, “Voice assistive technology for activities of daily living: developing an Alexa telehealth training for adults with cognitive-communication disorders,” in Proc. 2024 CHI Conf. Human Factors in Computing Systems, 2024

  6. [6]

    Interactive-Wear: an intelligent watch application to aid memory for intentions and everyday functioning in older adults with cognitive impairments,

    S. N. Pimento, H. Agarwal, B. Minor, S. Karia, D. Cook, M. Schmitter-Edgecombe, S. Tomaszewski Farias, R. Lorabi, and A. Weakley, “Interactive-Wear: an intelligent watch application to aid memory for intentions and everyday functioning in older adults with cognitive impairments,” in 2024 IEEE First Int. Conf. Artificial Intelligence for Medicine, Health a...

  7. [7]

    A plug-and-play desktop system for remote care of older adults with Alzheimer’s disease,

    S. Aswar, A. T. Weakley, P. Koppolu, S. Tomaszewski Farias, and A. Weakley, “A plug-and-play desktop system for remote care of older adults with Alzheimer’s disease,” in Proc. Int. Conf. Human Factors Design, Eng., Comput. (AHFE), 2026

  8. [8]

    Smart home technology: a new approach for performance measurements of activities of daily living and prediction of mild cognitive impairment in older adults,

    M. Lussier, S. Adam, B. Chikhaoui, C. Consel, M. Gagnon, B. Gilbert, S. Giroux, M. Guay, C. Hudon, H. Imbeault, F. Langlois, J. Macoir, H. Pigot, L. Talbot, and N. Bier, “Smart home technology: a new approach for performance measurements of activities of daily living and prediction of mild cognitive impairment in older adults,” J. Alzheimers Dis., vol. 68...

  9. [9]

    Smart home sensing and monitoring in households with dementia: user-centered design approach,

    F. Tiersen, P. Batey, M. J. C. Harrison, L. Naar, A.-I. Serban, S. J. C. Daniels, and R. A. Calvo, “Smart home sensing and monitoring in households with dementia: user-centered design approach,” JMIR Aging, vol. 4, no. 3, p. e27047, 2021

  10. [10]

    Robots in older people’s homes to improve medication adherence and quality of life: a randomised cross-over trial,

    E. Broadbent, K. Peri, N. Kerse, C. Jayawardena, I. Kuo, C. Datta, and B. MacDonald, “Robots in older people’s homes to improve medication adherence and quality of life: a randomised cross-over trial,” in Int. Conf. Social Robotics. Springer, 2014, pp. 64–73

  11. [11]

    Promoting cognitive health in elder care with large language model-powered socially assistive robots,

    M. R. Lima, A. O’Connell, F. Zhou, A. Nagahara, A. Hulyalkar, A. Deshpande, J. Thomason, R. Vaidyanathan, and M. Matari´c, “Promoting cognitive health in elder care with large language model-powered socially assistive robots,” in Proc. 2025 CHI Conf. Human Factors in Computing Systems, 2025

  12. [12]

    Smartphone text input method performance, usability, and preference with younger and older adults,

    A. L. Smith and B. S. Chaparro, “Smartphone text input method performance, usability, and preference with younger and older adults,” Hum. Factors, vol. 57, no. 6, pp. 1015–1028, 2015

  13. [13]

    Reducing loneliness and social isolation of older adults through voice assistants: literature review and bibliometric analysis,

    R. A. Marziali, C. Franceschetti, A. Dinculescu, A. Nistorescu, D. M. Krist´aly, A. A. Mos¸oi, R. Broekx, M. Marin, C. Vizitiu, and S.-A. Moraru, “Reducing loneliness and social isolation of older adults through voice assistants: literature review and bibliometric analysis,” J. Med. Internet Res., vol. 26, p. e50534, 2024. [16] J. Krueger, “Home as mind: ...

  14. [14]

    AI agents in Alzheimer’s disease management: Challenges and future directions,

    G. Grammenos, A. G. Vrahatis, K. Lazaros, T. P. Exarchos, P. Vlamos, and M. G. Krokidis, “AI agents in Alzheimer’s disease management: Challenges and future directions,” Front. Aging Neurosci., vol. 17, p. 1735892, 2026

  15. [15]

    Redefining elderly care with agentic AI: Challenges and opportunities,

    R. A. Khalil, K. Ahmad, and H. Ali, “Redefining elderly care with agentic AI: Challenges and opportunities,” IEEE Open J. Comput. Soc., vol. 7, pp. 326–342, 2026

  16. [16]

    AgenticAD: A specialized multi-agent system framework for holistic Alzheimer’s disease management,

    A. Bazgir, A. Habibdoust, X. Song, and Y. Zhang, “AgenticAD: A specialized multi-agent system framework for holistic Alzheimer’s disease management,” unpublished

  17. [17]

    Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots,

    C. Bartneck, D. Kuli´c, E. Croft, and S. Zoghbi, “Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots,” International journal of social robotics, vol. 1, no. 1, pp. 71–81, 2009

  18. [18]

    Believing anthropomorphism: Examining the role of anthropomorphic cues on trust in large language models,

    M. Cohn, M. Pushkarna, G. O. Olanubi, J. M. Moran, D. Padgett, Z. Mengesha, and C. Heldreth, “Believing anthropomorphism: Examining the role of anthropomorphic cues on trust in large language models,” in Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2024, pp. 1–15. [22] M. Cohn, S. Barreda, K. Graf Estes, Z. Yu, and G. Ze...

  19. [19]

    You have interrupted me again!: making voice assistants more dementia-friendly with incremental clarification,

    A. Addlesee and A. Eshghi, “You have interrupted me again!: making voice assistants more dementia-friendly with incremental clarification,” Front. Dement., vol. 3, p. 1343052, 2024

  20. [20]

    Challenges in automatic speech recognition for adults with cognitive impairment,

    M. Cohn, A. Lanzi, Y. Ishihara, C.-N. Chuah, G. Zellou, and A. Weakley, “Challenges in automatic speech recognition for adults with cognitive impairment,” in Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems. ACM, 2026

  21. [21]

    Automated speech recognition systems and older adults: a literature review and synthesis,

    L. Werner, G. Huang, and B. J. Pitts, “Automated speech recognition systems and older adults: a literature review and synthesis,” in Proc. Human Factors and Ergonomics Society Annual Meeting, 2019, pp. 42–46