pith. machine review for the scientific record. sign in

arxiv: 2605.10531 · v1 · submitted 2026-05-11 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

A Reflective Storytelling Agent for Older Adults: Integrating Argumentation Schemes and Argument Mining in LLM-Based Personalised Narratives

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:27 UTC · model grok-4.3

classification 💻 cs.AI
keywords LLM storytellingolder adultsargument miningargumentation schemespersonalized narrativesknowledge graphsdigital companionhealth promotion
0
0 comments X

The pith

Argument mining serves as a reflective inspection mechanism that compares formal grounding signals with human evaluations in LLM-based personalized storytelling for older adults.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether structured knowledge from graphs and user models, combined with argumentation theory, can make LLM stories for older adults more purposeful and less prone to hallucination. It builds a reflective agent that generates health-promoting narratives and then uses argument mining to inspect them against computed quality indicators. Testing with experts refined the system, after which 55 older adults rated stories on purpose, usefulness, cultural fit, and consistency. The results show participants spotting personally relevant purposes in about two thirds of cases, with argument-based purposes in half of those, and links between higher argument quality and better human ratings of clarity. This matters because it offers a concrete way to make digital companions for aging users more transparent and aligned with real motivations.

Core claim

The central claim is that integrating argumentation schemes and argument mining into an LLM-based storytelling agent allows the system to generate narratives grounded in structured user models of health-promoting activities and motivations, while using computed hallucination-risk and argument-quality indicators to inspect the output. In a two-phase study, participatory design with domain experts refined the approach, and evaluation with 55 older adults found that personally relevant purposes were recognized in roughly two thirds of narratives and argument-based purposes in about half of those; cultural relatability strongly affected willingness to use the tool, minor inconsistencies were oft

What carries the argument

The reflective storytelling agent that integrates knowledge graphs, user models, argumentation schemes, and argument mining to guide narrative generation and inspect it for grounding and quality.

If this is right

  • Participants recognized personally relevant purposes in roughly two thirds of the generated narratives.
  • Argument-based purposes were identified in around half of the cases where purposes were recognized.
  • Cultural recognisability strongly influenced willingness to use the storytelling functionality.
  • Narratives with higher hallucination-risk indicators were more often perceived as inconsistent.
  • Higher argument-quality indicators tended to co-occur with higher clarity and meaningfulness ratings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The reflective use of argument mining could be tested in other narrative applications such as education or chronic-disease support where consistency with user goals matters.
  • Longitudinal deployment might reveal whether repeated exposure to these purpose-aligned stories produces measurable shifts in older adults' health-related activities.
  • Combining the computed indicators with live user feedback could enable the agent to revise stories dynamically during an interaction session.

Load-bearing premise

That adding argumentation schemes and argument mining to LLMs will reduce hallucinations and raise perceived purposefulness and consistency in health stories for older adults, as shown by matching participant ratings to the system's computed indicators.

What would settle it

If the system's hallucination-risk indicators show no correlation with older adults' reports of inconsistency, or if argument-quality indicators do not align with higher human ratings of clarity and meaningfulness, the value of argument mining as a reflective inspection tool would be undermined.

Figures

Figures reproduced from arXiv: 2605.10531 by Helena Lindgren, Jayalakshmi Baskar, Kaan Kilic, Vera C. Kaelin.

Figure 1
Figure 1. Figure 1: Overview of the study flow embedding reflective storytelling system mechanisms illustrated by white boxes, modified [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: System architecture of the personalised reflective storytelling framework. The five levels of the system (numbered 1-5) [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Mean participant ratings of narrative relevance across dialogue types and creativity levels. [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Argument-mining indicators in relation to human judgments and dialogue conditions. [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
read the original abstract

This work investigates whether knowledge-driven large language model (LLM)-based storytelling can support purposeful narrative interaction with a digital companion for older adults. To address known limitations of LLMs, including hallucinations and limited transparency, we present a reflective storytelling agent integrating knowledge graphs, user modelling, argumentation theory, and argument mining to guide and inspect narrative generation. The study consisted of two phases. Phase I employed participatory design involving 11 domain experts in a formative evaluation that informed iterative refinement. The resulting system generates narratives grounded in structured user models representing health-promoting activities and motivations. Phase II involved 55 older adults evaluating persona-based narratives across four prompts and two creativity levels. Participants assessed perceived purpose, usefulness, cultural relatability, and inconsistencies. The system additionally computed hallucination-risk indicators to evaluate generated narratives. Participants recognised personally relevant purposes in roughly two thirds of narratives, while argument-based purposes were identified in around half of these cases. Cultural recognisability strongly influenced willingness to use the functionality, whereas minor inconsistencies were often tolerated when narratives remained understandable and personally relevant. Narratives with higher hallucination-risk indicators were more often perceived as inconsistent, while higher argument-quality indicators tended to co-occur with higher clarity and meaningfulness ratings. Overall, the study positions argument mining as a reflective inspection mechanism for comparing formal grounding signals with human evaluations in health-oriented LLM storytelling for older adults.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a reflective storytelling agent for older adults that integrates knowledge graphs, user modelling, argumentation theory, and argument mining with LLMs to generate personalized, health-promoting narratives. It reports a two-phase study: Phase I uses participatory design with 11 domain experts to iteratively refine the system, while Phase II has 55 older adults evaluate persona-based narratives generated across four prompts and two creativity levels on perceived purpose, usefulness, cultural relatability, and inconsistencies; the system also computes post-hoc hallucination-risk and argument-quality indicators. The work positions argument mining as a reflective inspection mechanism for comparing formal grounding signals with human evaluations in health-oriented LLM storytelling.

Significance. If the integration demonstrably improves grounding and user perceptions, the approach could inform the design of transparent digital companions for older adults, addressing key LLM limitations in personalized health narratives. The participatory design process and dual-phase evaluation involving both experts and target users represent strengths, providing empirical data on how personal relevance influences tolerance for minor inconsistencies and how computed indicators align with subjective ratings.

major comments (2)
  1. [Phase II evaluation] Phase II evaluation (n=55 older adults): The study reports correlations between higher argument-quality indicators and better clarity/meaningfulness ratings, and between hallucination-risk indicators and perceived inconsistencies, but contains no baseline arm using the same user models and prompts without the argumentation schemes and argument mining layer. This design choice makes it impossible to isolate whether the observed benefits in purposefulness (recognized in ~2/3 of narratives) or tolerance of inconsistencies are attributable to the proposed integration rather than the underlying LLM or structured user models, directly undermining the central positioning of argument mining as an effective reflective mechanism for reducing hallucinations.
  2. [Abstract and Phase II results] Abstract and Phase II results: The manuscript lacks details on the statistical methods, exact metrics for linking hallucination indicators to perceptions, controls for prompt variability, or inter-rater reliability for the participant assessments. Without these, the claim that 'higher hallucination-risk indicators were more often perceived as inconsistent' and that argument-quality indicators 'tended to co-occur with higher clarity' cannot be fully evaluated for robustness.
minor comments (2)
  1. [Abstract] The abstract could more explicitly quantify key findings (e.g., exact proportions for purpose recognition and argument-based purposes) and clarify how the four prompts and two creativity levels were distributed across participants.
  2. [System description] Notation for the computed indicators (hallucination-risk and argument-quality) should be defined more clearly when first introduced, including how they are derived from the argument mining component.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which help us improve the clarity and rigor of our work. Below, we provide point-by-point responses to the major comments.

read point-by-point responses
  1. Referee: [Phase II evaluation] Phase II evaluation (n=55 older adults): The study reports correlations between higher argument-quality indicators and better clarity/meaningfulness ratings, and between hallucination-risk indicators and perceived inconsistencies, but contains no baseline arm using the same user models and prompts without the argumentation schemes and argument mining layer. This design choice makes it impossible to isolate whether the observed benefits in purposefulness (recognized in ~2/3 of narratives) or tolerance of inconsistencies are attributable to the proposed integration rather than the underlying LLM or structured user models, directly undermining the central positioning of argument mining as an effective reflective mechanism for reducing hallucinations.

    Authors: We agree that a baseline condition without the argumentation schemes and argument mining layer would allow stronger causal attribution of effects to the proposed integration. Our Phase II study was designed as an initial exploration of the fully integrated reflective storytelling agent with the target user population, following iterative refinement via participatory design in Phase I. The evaluation demonstrates that the complete system produces narratives recognized as purposeful by older adults and that the computed indicators align with subjective perceptions of inconsistency and clarity. We note that the manuscript positions argument mining specifically as a reflective inspection mechanism for comparing formal grounding signals with human evaluations, rather than as a direct reducer of hallucinations. We will revise the Discussion and Limitations sections to explicitly acknowledge the absence of a baseline and to outline plans for future controlled experiments that include such comparisons. revision: partial

  2. Referee: [Abstract and Phase II results] Abstract and Phase II results: The manuscript lacks details on the statistical methods, exact metrics for linking hallucination indicators to perceptions, controls for prompt variability, or inter-rater reliability for the participant assessments. Without these, the claim that 'higher hallucination-risk indicators were more often perceived as inconsistent' and that argument-quality indicators 'tended to co-occur with higher clarity' cannot be fully evaluated for robustness.

    Authors: We agree that additional methodological transparency is required. The reported observations derive from descriptive frequency counts, cross-tabulations of indicator values against participant ratings, and thematic analysis of open responses; no inferential statistical tests were applied given the exploratory character of the study. We will expand the Phase II Results section (and update the abstract accordingly) to specify: the exact descriptive metrics and linking procedures used; how prompt variability was addressed through fixed prompt templates and balanced assignment across creativity levels; and any inter-rater reliability checks performed on the coding of participant assessments. These revisions will enable readers to evaluate the robustness of the alignment findings. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system evaluation without derivations or self-referential fitting

full rationale

The paper describes an LLM storytelling system that integrates knowledge graphs, user models, argumentation schemes, and argument mining, then evaluates it via participatory design (Phase I, n=11 experts) and participant ratings (Phase II, n=55 older adults) plus post-hoc computed indicators for hallucination risk and argument quality. No equations, predictions, or first-principles derivations appear in the provided text. Claims rest on observed co-occurrences between human ratings (purposefulness, clarity, inconsistencies) and system indicators, which are independent measurements rather than reductions by construction. No self-citations are invoked to justify uniqueness theorems, ansatzes, or load-bearing premises. The absence of a control arm affects causal attribution but does not create circularity in the derivation chain. This is a standard empirical HCI/AI paper whose central positioning of argument mining as a reflective mechanism is grounded in external participant data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical system development and user study rather than a theoretical derivation; no free parameters, axioms, or invented entities are invoked in the abstract.

pith-pipeline@v0.9.0 · 5565 in / 1216 out tokens · 34399 ms · 2026-05-12T04:27:26.140947+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages

  1. [1]

    Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz

    Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, Ne...

  2. [2]

    Dang Anh-Hoang, Vu Tran, and Le-Minh Nguyen. 2025. Survey and analysis of hallucinations in large language models: attribution to prompting strategies or model behavior.Frontiers in Artificial Intelligence8 (2025), 1–21. doi:10.3389/frai.2025.1622292

  3. [3]

    Elham Asgari, Nina Montaña-Brown, Magda Dubois, Saleh Khalil, Jasmine Balloch, Joshua Au Yeung, and Dominic Pimenta. 2025. A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation.NPJ Digital Medicine8 (2025), 1–15. doi:10.1038/s41746-025- 01670-7

  4. [4]

    Gagan Bansal, Besmira Nushi, Ece Kamar, Eric Horvitz, and Daniel S. Weld. 2021. Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork. InProceedings of the AAAI Conference on Artificial Intelligence. AAAI, Palo Alto, USA, 11405–11414. doi:10.1609/aaai.v35i13.17359

  5. [5]

    2018.Handbook of formal argumentation

    Pietro Baroni, Dov Gabbay, Massimilino Giacomin, and Leendert Van der Torre. 2018.Handbook of formal argumentation. College Publications

  6. [6]

    Jayalakshmi Baskar, Rebecka Janols, Esteban Guerrero, Juan Carlos Nieves, and Helena Lindgren. 2017. A multipurpose goal model for personalised digital coaching. InAgents and Multi-Agent Systems for Health Care: 10th International Workshop, A2HC 2017, São Paulo, Brazil, May 8, 2017, and International Workshop, A-HEALTH 2017, Porto, Portugal, June 21, 2017...

  7. [7]

    Jayalakshmi Baskar, Kaan Kilic, Vera C Kaelin, and Helena Lindgren. 2025. Towards collaborative planning for health promotion through person- tailored storytelling and argumentation. In1st Workshop on Human-AI Collaborative Systems co-located with 28th European Conference on Artificial Intelligence (ECAI 2025), Vol. 4072. CEUR-WS, Bologna, Italy, 70–83

  8. [8]

    Jayalakshmi Baskar and Helena Lindgren. 2015. Human-Agent Dialogues on Health Topics - An Evaluation Study. InHighlights of Practical Applications of Agents, Multi-Agent Systems, and Sustainability - The PAAMS Collection, Javier Bajo, Kasper Hallenborg, Pawel Pawlewski, Vicente Botti, Nayat Sánchez-Pi, Nestor Darío Duque Méndez, Fernando Lopes, and Vicent...

  9. [9]

    Tessa Beinema, Harm Op den Akker, Hermie J Hermens, and Lex van Velsen. 2023. What to Discuss?—A Blueprint Topic Model for Health Coaching Dialogues With Conversational Agents.International Journal of Human–Computer Interaction39, 1 (2023), 164–182

  10. [10]

    Trevor JM Bench-Capon. 2003. Persuasion in practical argument using value-based argumentation frameworks.Journal of Logic and Computation 13, 3 (2003), 429–448

  11. [11]

    Tarek R Besold, Sebastian Bader, Howard Bowman, Pedro Domingos, Pascal Hitzler, Kai-Uwe Kühnberger, Luis C Lamb, Priscila Machado Vieira Lima, Leo de Penning, Gadi Pinkas, et al. 2021. Neural-symbolic learning and reasoning: A survey and interpretation 1. InNeuro-symbolic artificial intelligence: The state of the art. IOS press, 1–51

  12. [12]

    Elfia Bezou-Vrakatseli, Oana Cocarascu, and Sanjay Modgil. 2025. Can Large Language Models Understand Argument Schemes?. InFindings of the Association for Computational Linguistics: ACL 2025. ACL, Vienna, Austria, 13666–13681

  13. [13]

    Elizabeth Black and Anthony Hunter. 2009. An inquiry dialogue system.Autonomous Agents and Multi-Agent Systems19, 2 (2009), 173–209

  14. [14]

    Jerome Bruner. 1991. The narrative construction of reality.Critical inquiry18, 1 (1991), 1–21

  15. [15]

    HeeKyung Chang, YoungJoo Do, and JinYeong Ahn. 2023. Digital Storytelling as an Intervention for Older Adults: A Scoping Review.International Journal of Environmental Research and Public Health20, 2 (2023), 1–17. doi:10.3390/ijerph20021344

  16. [16]

    Carlos Chesñevar, McGinnis, Sanjay Modgil, Iyad Rahwan, Chris Reed, Guillermo Simari, Matthew South, Gerhard Vreeswijk, and Steven Willmott

  17. [17]

    doi:10.1017/S0269888906001044

    Towards an argument interchange format.The Knowledge Engineering Review21, 4 (2006), 293–316. doi:10.1017/S0269888906001044

  18. [18]

    Victor David and Anthony Hunter. 2025. A logic-based framework for decoding enthymemes in argument maps involving implicitness in premises and claims. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI ’25). ACM, Montreal, Canada, Article 495, 9 pages. doi:10.24963/ijcai.2025/495

  19. [19]

    Demollin, Qurat-ul-ain Shaheen, Katarzyna Budzynska, and Carlos Sierra

    Martijn H. Demollin, Qurat-ul-ain Shaheen, Katarzyna Budzynska, and Carlos Sierra. 2020. Argumentation Theoretical Frameworks for Explainable Artificial Intelligence. InProceedings of the 2nd Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence (NL4XAI 2020). ACL, Dublin, Ireland, 44–49

  20. [20]

    Phan Minh Dung. 1995. On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games.Artificial intelligence77, 2 (1995), 321–357

  21. [21]

    Mark A Finlayson, Whitman Richards, and Patrick H Winston. 2010. Computational models of narrative: review of the workshop.Ai Magazine31, 2 (2010), 97–100

  22. [22]

    Gordon, Horst Friedrich, and Douglas Walton

    Thomas F. Gordon, Horst Friedrich, and Douglas Walton. 2018. Representing argumentation schemes with Constraint Handling Rules (CHR). Argument & Computation9, 2 (2018), 91–119. doi:10.3233/AAC-180039 Publisher: SAGE Publications

  23. [23]

    Thomas R Gruber. 1995. Toward principles for the design of ontologies used for knowledge sharing?International journal of human-computer studies43, 5-6 (1995), 907–928

  24. [24]

    Matteo Guida, Yulia Otmakhova, Eduard Hovy, and Lea Frermann. 2025. LLMs for Argument Mining: Detection, Extraction, and Relationship Classification of pre-defined Arguments in Online Comments. InProceedings of The 23rd Annual Workshop of the Australasian Language Technology Association. ACL, Sydney, Australia, 176–191

  25. [25]

    Michael Townsen Hicks, James Humphries, and Joe Slater. 2024. ChatGPT is bullshit.Ethics and Information Technology26, 2 (2024), 38. doi:10.1007/s10676-024-09775-5

  26. [26]

    Leslie J Hinyard and Matthew W Kreuter. 2007. Using narrative communication as a tool for health behavior change: a conceptual, theoretical, and empirical overview.Health education & behavior34, 5 (2007), 777–792

  27. [27]

    Anthony Hunter. 2024. Dialogical Argumentation for Behaviour Change with Multiple Persuasion Goals. InComputational Models of Argument - Proceedings of COMMA 2024, Hagen, Germany, September 18-20, 2024 (Frontiers in Artificial Intelligence and Applications, Vol. 388), Chris Reed, Matthias Thimm, and Tjitze Rienstra (Eds.). IOS Press, Hagen, Germany, 97–10...

  28. [28]

    Rehan Iftikhar, Yi-Te Chiu, Mohammad Saud Khan, and Catherine Caudwell. 2024. Human–Agent Team Dynamics: A Review and Future Research Opportunities.IEEE Transactions on Engineering Management71 (2024), 10139–10154. doi:10.1109/TEM.2023.3331369

  29. [29]

    Kaan Kilic, Saskia Weck, Timotheus Kampik, and Helena Lindgren. 2023. Argument-based human–AI collaboration for supporting behavior change to improve health.Frontiers in Artificial Intelligence6 (2023), 1069455

  30. [30]

    John Lawrence and Chris Reed. 2015. Combining Argument Mining Techniques. InProceedings of the 2nd Workshop on Argumentation Mining, Claire Cardie (Ed.). Association for Computational Linguistics, Denver, CO, 127–136. doi:10.3115/v1/W15-0516

  31. [31]

    Francesco Leofante, Hamed Ayoobi, Adam Dejl, Gabriel Freedman, Deniz Gorur, Junqi Jiang, Guilherme Paulino-Passos, Antonio Rago, Anna Rapberger, Fabrizio Russo, Xiang Yin, Dekai Zhang, and Francesca Toni. 2024. Contestable AI Needs Computational Argumentation. (2024). doi:10.24963/KR.2024/83

  32. [32]

    Hao Li, Viktor Schlegel, Yizheng Sun, Riza Batista-Navarro, and Goran Nenadic. 2025. Large Language Models in Argument Mining: A Survey. https://arxiv.org/html/2506.16383v4 arXiv:2506.16383v4 [cs.CL]

  33. [33]

    Xinyi Li, Sai Wang, Siqi Zeng, Yu Wu, and Yi Yang. 2024. A survey on LLM-based multi-agent systems: workflow, infrastructure, and challenges. Vicinagearth1, 1 (2024), 9

  34. [34]

    Helena Lindgren, Vera C Kaelin, Ann-Margreth Ljusbäck, Maitreyee Tewari, Michele Persiani, and Ingeborg Nilsson. 2024. To adapt or not to adapt? older adults enacting agency in dialogues with an unknowledgeable agent. InProceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization. ACM, New York, NY, 307–316. 24 Baskar et al

  35. [35]

    Helena Lindgren, Timotheus Kampik, Esteban Guerrero Rosero, Madeleine Blusi, and Juan Carlos Nieves. 2021. Argumentation-Based Health Information Systems: A Design Methodology.IEEE Intelligent Systems36, 2 (2021), 72–80. doi:10.1109/MIS.2020.3044944

  36. [36]

    Helena Lindgren and Kaan Kilic. 2025. Participatory Design and Evaluation of Knowledge-Based Personalised Digital Coaching for Improving Health - the Star Multicomponent Lifestyle Intervention.Frontiers in Digital Health7 (2025), 1600535

  37. [37]

    Helena Lindgren, Kristina Lindvall, and Linda Richter-Sundberg. 2025. Responsible design of an AI system for health behavior change—an ethics perspective on the participatory design process of the STAR-C digital coach.Frontiers in Digital HealthVolume 7 - 2025 (2025), 1–15. doi:10.3389/fdgth.2025.1436347

  38. [38]

    Helena Lindgren and Saskia Weck. 2021. Conceptual Model for Behaviour Change Progress - Instrument in Design Processes for Behaviour Change Systems.Studies in Health Technology and Informatics285 (2021), 277–280. doi:10.3233/SHTI210614

  39. [39]

    Helena Lindgren and Saskia Weck. 2022. Contextualising Goal Setting for Behaviour Change – from Baby Steps to Value Directions. InProceedings of the 33rd European Conference on Cognitive Ergonomics (ECCE ’22). Association for Computing Machinery, New York, NY, USA, 1–7. doi:10.1145/ 3552327.3552342

  40. [40]

    Helena Lindgren and Chunli Yan. 2015. ACKTUS: A Platform for Developing Personalized Support Systems in the Health Domain. InProceedings of the 5th International Conference on Digital Health 2015. ACM, Florence Italy, 135–142. doi:10.1145/2750511.2750526

  41. [41]

    Marco Lippi and Paolo Torroni. 2016. Argumentation mining: State of the art and emerging trends.ACM Transactions on Internet Technology (TOIT) 16, 2 (2016), 1–25

  42. [42]

    Jiahong Liu, Zexuan Qiu, Zhongyang Li, Quanyu Dai, Wenhao Yu, Jieming Zhu, Minda Hu, Menglin Yang, Tat-Seng Chua, and Irwin King. 2025. A Survey of Personalized Large Language Models: Progress and Future Directions. doi:10.48550/arXiv.2502.11528 arXiv:2502.11528 [cs]

  43. [43]

    Fabrizio Macagno. 2017. Argumentation schemes.The Routledge Handbook of Argumentation Theory(2017), 169–182

  44. [44]

    Pattie Maes. 1987. Concepts and experiments in computational reflection.ACM Sigplan Notices22, 12 (1987), 147–155

  45. [45]

    Pattie Maes. 1988. Computational reflection.The Knowledge Engineering Review3, 1 (1988), 1–19

  46. [46]

    Lucie Charlotte Magister, Katherine Metcalf, Yizhe Zhang, and Maartje ter Hoeve. 2025. On the Way to LLM Personalization: Learning to Remember User Conversations. InProceedings of the First Workshop on Large Language Model Memorization (L2M2). ACL, Vienna, Austria, 61–77

  47. [47]

    Potsawee Manakul, Adian Liusie, and Mark Gales. 2023. SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Singapore, 9004–9017. doi:10.18653/v1/2023.emnlp-main.557

  48. [48]

    Raquel Mochales and Marie-Francine Moens. 2011. Argumentation mining.Artificial intelligence and law19, 1 (2011), 1–22

  49. [49]

    OpenAI. 2025. Introducing GPT-5. https://openai.com/index/introducing-gpt-5/ Accessed: 2025-08-08

  50. [50]

    Adriana Maria Rios Rincon, Antonio Miguel Cruz, Christine Daum, Noelannah Neubauer, Aidan Comeau, and Lili Liu. 2022. Digital Storytelling in Older Adults With Typical Aging, and With Mild Cognitive Impairment or Dementia: A Systematic Literature Review.Journal of Applied Gerontology41, 3 (2022), 867–880. doi:10.1177/07334648211015456

  51. [51]

    Giuseppe Riva, Andrea Gaggioli, Daniela Villani, Pietro Cipresso, Claudia Repetto, Silvia Serino, Stefano Triberti, Eleonora Brivio, Carlo Galimberti, and Guendalina Graffigna. 2014. Positive Technology for Healthy Living and Active Ageing.Studies in Health Technology and Informatics203 (2014), 44–56

  52. [52]

    Ramon Ruiz-Dolz, Stella Heras, and Ana García-Fornes. 2025. An introduction to computational argumentation research from a human argumentation perspective.Autonomous Agents and Multi-Agent Systems39, 1 (2025), 11

  53. [53]

    Schmidt, Sebastian Gottifredi, and Alejandro J

    Federico M. Schmidt, Sebastian Gottifredi, and Alejandro J. García. 2025. High level argumentative errors in Argument Mining: Enhancing argument detection and automatic error prediction.Expert Systems with Applications286 (2025), 1–20. doi:10.1016/j.eswa.2025.127886

  54. [54]

    Constanze Schreiner, Markus Appel, Maj-Britt Isberner, and Tobias Richter. 2018. Argument strength and the persuasiveness of stories.Discourse Processes55, 4 (2018), 371–386

  55. [55]

    Seema Sehrawat, Celeste Jones, Jennifer Orlando, Tucker Bowers, and Alexi Rubins. 2017. Digital storytelling: A tool for social connectedness. Gerontechnology16, 1 (2017), 56–61. doi:10.4017/gt.2017.16.1.006.00

  56. [56]

    Nisha Simon and Christian Muise. 2022. TattleTale - Storytelling with Planning and Large Language Models. InICAPS Workshop on Scheduling and Planning Applications workshop. AAAI, Virtual, 1–9

  57. [57]

    Luc Steels. 2020. Personal dynamic memories are necessary to deal with meaning and understanding in human-centric AI.. InNeHuAI@ ECAI. CEUR-WS. org, CEUR-WS, Barcelona, Spain, 11–16

  58. [58]

    Luc Steels, Lara Verheyen, and Remi van Trijp. 2022. An experiment in measuring understanding. InHHAI2022: Augmenting Human Intellect. IOS Press, Vienna, Austria, 241–242

  59. [59]

    Iris Ten Klooster, Hanneke Kip, Sina L Beyer, Lisette JEWC van Gemert-Pijnen, and Saskia M Kelders. 2024. Clarifying the Concepts of Personalization and Tailoring of eHealth Technologies: Multimethod Qualitative Study.Journal of medical Internet research26 (2024), e50497

  60. [60]

    2015.A framework for relating, implementing and verifying argumentation models and their translations

    Bas van Gijzel. 2015.A framework for relating, implementing and verifying argumentation models and their translations. Ph. D. Dissertation. University of Nottingham

  61. [61]

    Lex van Velsen, Marijke Broekhuis, Stephanie Jansen-Kosterink, and Harm op den Akker. 2019. Tailoring Persuasive Electronic Health Strategies for Older Adults on the Basis of Personal Motivation: Web-Based Survey Study.Journal of Medical Internet Research21, 9 (2019), e11759. doi:10.2196/11759

  62. [62]

    Douglas Walton. 2009. Argumentation theory: A very short introduction. InArgumentation in artificial intelligence. Springer, Boston, MA, 1–22. A Reflective Storytelling Agent for Older Adults: Integrating Argumentation Schemes and Argument Mining in LLM-Based Personalised Narratives 25

  63. [63]

    Douglas Walton and Erik C. W. Krabbe. 1995. Dialogues: Types, Goals and Shifts. InCommitment in Dialogue: Basic Concepts of Interpersonal Reasoning. SUNY Press, Albany, NY, Chapter 3, 65–117

  64. [64]

    2008.Argumentation Schemes

    Douglas Walton, Chris Reed, and Fabrizio Macagno. 2008.Argumentation Schemes. Cambridge University Press, Cambridge, UK. https: //www.cambridge.org/us/academic/subjects/philosophy/logic/argumentation-schemes

  65. [65]

    Hongru Wang, Rui Wang, Fei Mi, Yang Deng, Zezhong Wang, Bin Liang, Ruifeng Xu, and Kam-Fai Wong. 2023. Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs. InFindings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics, Singapore, 12047–12064. doi:10.18653/v1/2023....

  66. [66]

    Chunli Yan, Juan Carlos Nieves, and Helena Lindgren. 2018. A dialogue-based approach for dealing with uncertain and conflicting information in the setting of medical diagnosis.Auton Agent Multi-Agent Syst(2018), 1–25. doi:10.1007/s10458-018-9396-x

  67. [67]

    Luan Zhang, Dandan Song, Zhijing Wu, Yuhang Tian, Changzhi Zhou, Jing Xu, Ziyi Yang, and Shuhao Zhang. 2025. Detecting Hallucination in Large Language Models Through Deep Internal Representation Analysis. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI-25). ACM, Montreal, Canada, 8357–8365