pith. machine review for the scientific record. sign in

arxiv: 2602.14183 · v2 · submitted 2026-02-15 · 💻 cs.HC

Recognition: no theorem link

Exploring a Multimodal Chatbot as a Facilitator in Therapeutic Art Activity

Authors on Pith no claims yet

Pith reviewed 2026-05-15 22:08 UTC · model grok-4.3

classification 💻 cs.HC
keywords multimodal chatbottherapeutic artMLLMart therapyreflective conversationvisual analysisAI facilitationcreative expression
0
0 comments X

The pith

An MLLM-powered chatbot can support therapeutic art by analyzing visual creations in real time and holding reflective conversations with the maker.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a chatbot built on multimodal large language models that observes someone drawing or painting and responds with questions or observations meant to deepen self-reflection. The authors argue this combination of visual interpretation and dialogue can extend the reach of art therapy by offering immediate, interactive guidance during the creative act itself. They tested the concept through feedback from five experts in art therapy and related areas, who identified both strengths in engagement and specific design needs for safety and personalization. The work positions the chatbot as a new kind of facilitator rather than a replacement for human therapists.

Core claim

An MLLM-powered chatbot that processes real-time visual input from ongoing art-making together with conversational text can act as a facilitator in therapeutic art activities, according to qualitative evaluation with five experts who saw value in its capacity to promote reflective engagement while noting requirements for entry points, risk handling, style alignment, conversation balance, and richer visual features.

What carries the argument

The MLLM-powered chatbot that analyzes visual creation in real-time while engaging the creator in reflective conversations.

If this is right

  • Real-time visual analysis combined with dialogue can keep creators engaged in reflective thinking during the art process.
  • Design choices around user entry, safety boundaries, and matching therapy style will determine practical usefulness.
  • Conversations need to balance depth with breadth to avoid overwhelming or superficial exchanges.
  • Adding more ways for the system to reference and interact with the visual output can strengthen the experience.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Direct user studies with people receiving therapy could reveal whether the chatbot's interpretations match human perception of the same artwork.
  • The approach might extend to other creative modalities such as music or writing if the same real-time multimodal loop proves reliable.
  • Long-term use could generate data on how reflective patterns evolve across multiple sessions, informing adaptive therapy models.

Load-bearing premise

Expert opinions from five specialists are enough to show the chatbot can facilitate therapeutic engagement even without tests involving actual users or checks on how accurately the model reads the artwork.

What would settle it

A controlled session in which participants make art both with and without the chatbot and show no measurable difference in reported reflection, emotional insight, or sense of therapeutic value.

Figures

Figures reproduced from arXiv: 2602.14183 by Jing Liao, Le Lin, Rainbow Tin Hung Ho, Yuhan Luo, Zihao Zhu.

Figure 1
Figure 1. Figure 1: The overview of the AI-mediated therapeutic art activity platform: (A) Conversation Window where the user engages [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An overview of the system architecture. The image understanding module takes the client’s drawing and the designed [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The artworks created by the expert participants using our system during the study. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

Therapeutic art activities, such as expressive drawing and painting, require the synergy between creative visual production and interactive dialogue. Recent advancements in Multimodal Large Language Models (MLLMs) have expanded the capacity of computing systems to interpret both textual and visual data, offering a new frontier for AI-mediated therapeutic support. This work-in-progress paper introduces an MLLM-powered chatbot that analyzes visual creation in real-time while engaging the creator in reflective conversations. We conducted an evaluation with five experts in art therapy and related fields, which demonstrated the chatbot's potential to facilitate therapeutic engagement, and highlighted several areas for future development, including entryways and risk management, bespoke alignment of user profile and therapeutic style, balancing conversational depth and width, and enriching visual interactivity. These themes provide a design roadmap for designing the future AI-mediated creative expression tools.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. This work-in-progress paper introduces an MLLM-powered chatbot for therapeutic art activities that performs real-time visual analysis of user creations while conducting reflective conversations. The authors report an evaluation involving five experts in art therapy and related fields, claiming that the feedback demonstrates the chatbot's potential to facilitate therapeutic engagement, and they outline future development needs including entryways, risk management, bespoke alignment, conversational balance, and visual interactivity.

Significance. If the central claim holds after more rigorous validation, the work could provide a useful design roadmap for multimodal AI tools in creative therapy contexts within HCI. The exploratory nature and identification of specific future challenges (e.g., risk management and alignment) are constructive, but the current evidence base limits immediate impact.

major comments (2)
  1. [Evaluation] Evaluation section: the claim that the n=5 expert evaluation 'demonstrated the chatbot's potential to facilitate therapeutic engagement' (abstract and main text) rests on qualitative feedback alone, with no reported quantitative measures, error bars, details on how visual analysis accuracy was assessed, logged interaction data, or actual user sessions in therapeutic contexts. This is load-bearing for the central claim and leaves the efficacy untested.
  2. [Abstract and System Description] Abstract and System Description: the assertion of 'real-time' visual analysis and reflective conversation facilitation is presented without any metrics, examples of successful interpretations, or validation against ground-truth therapeutic outcomes, making it difficult to assess whether the MLLM component performs as described.
minor comments (2)
  1. [Future Work] The future-work themes (entryways, risk management, etc.) are listed clearly but would benefit from one or two concrete examples or preliminary design sketches to strengthen the roadmap.
  2. [System Description] Clarify the exact prompts or MLLM configuration used for visual analysis to improve reproducibility, even in a work-in-progress format.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work-in-progress paper. We address each major comment below, indicating where revisions will be made to clarify scope and strengthen the manuscript.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the claim that the n=5 expert evaluation 'demonstrated the chatbot's potential to facilitate therapeutic engagement' (abstract and main text) rests on qualitative feedback alone, with no reported quantitative measures, error bars, details on how visual analysis accuracy was assessed, logged interaction data, or actual user sessions in therapeutic contexts. This is load-bearing for the central claim and leaves the efficacy untested.

    Authors: We agree that the evaluation is qualitative and exploratory, consistent with the work-in-progress nature of the paper. The n=5 expert sessions were designed to gather initial design insights through simulated interactions rather than to provide quantitative validation or efficacy testing. In the revision we will (1) add details on expert backgrounds, interview protocol, and how feedback was collected, (2) explicitly state that the results indicate potential but do not constitute a rigorous efficacy study, and (3) tone down the abstract and main-text claims accordingly. Quantitative measures, logged interaction data, and real therapeutic user sessions are outside the current scope and will be pursued in follow-up work. revision: partial

  2. Referee: [Abstract and System Description] Abstract and System Description: the assertion of 'real-time' visual analysis and reflective conversation facilitation is presented without any metrics, examples of successful interpretations, or validation against ground-truth therapeutic outcomes, making it difficult to assess whether the MLLM component performs as described.

    Authors: We will revise the abstract and system description to include concrete examples of visual analysis outputs and sample conversation excerpts drawn from the expert sessions. These additions will illustrate how the MLLM component operated in practice. We note that formal accuracy metrics and ground-truth therapeutic validation are not yet available for this initial prototype; the experts' qualitative judgments of interpretation quality will be reported in greater detail in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity detected in qualitative exploratory study

full rationale

The paper is a work-in-progress introduction of an MLLM-powered chatbot concept for therapeutic art activities, supported solely by qualitative feedback from five experts. No equations, parameters, derivations, predictions, or self-citations appear in the provided text or abstract. The evaluation is presented as preliminary design exploration without any fitted inputs renamed as outputs, self-definitional loops, or load-bearing uniqueness claims. All central statements rest on direct expert commentary rather than any reduction to prior inputs by construction, making the derivation chain self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the domain assumption that current multimodal LLMs can perform meaningful real-time interpretation of artistic visuals in a therapeutic context and that small-scale expert qualitative feedback provides valid initial validation for design exploration.

axioms (1)
  • domain assumption Multimodal LLMs can meaningfully interpret artistic visual content in a therapeutic context
    Invoked when describing the chatbot's real-time analysis and reflective conversation capabilities.

pith-pipeline@v0.9.0 · 5445 in / 1184 out tokens · 38800 ms · 2026-05-15T22:08:20.877670+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

  1. [1]

    Sherry L Beaumont. 2012. Art therapy approaches for identity problems during adolescence.Canadian Art Therapy Association Journal25, 1 (2012)

  2. [2]

    Alan Briks. 2007. Art therapy with adolescents: Vehicle to engagement and transformation.Canadian Art Therapy Association Journal20, 1 (2007)

  3. [3]

    2008.Art as therapy: An introduction to the use of art as a therapeutic technique

    Tessa Dalley. 2008.Art as therapy: An introduction to the use of art as a therapeutic technique. Routledge

  4. [4]

    Xuejun Du, Pengcheng An, Justin Leung, April Li, Linda E Chapman, and Jian Zhao. 2024. DeepThInk: Designing and probing human-AI co-creation in digital art therapy.International journal of human-computer studies181 (2024), 103139

  5. [5]

    Holly Feen-Calligan. 1996. Art therapy as a profession: Implications for the education and training of art therapists.Art Therapy13, 3 (1996)

  6. [6]

    Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. InProceedings of the SIGCHI conference on Human Factors in Computing Systems

  7. [7]

    Yi-Chieh Lee, Yichao Cui, Jack Jamieson, Wayne Fu, and Naomi Yamashita. 2023. Exploring effects of chatbot-based social contact on reducing mental illness stigma. InProceedings of the 2023 CHI conference on human factors in computing systems. 1–16

  8. [8]

    Di Liu, Jingwen Bai, Zhuoyi Zhang, Yilin Zhang, Zhenhao Zhang, Jian Zhao, and Pengcheng An. 2025. TherAIssist: Assisting Art Therapy Homework and Client-Practitioner Collaboration through Human-AI Interaction.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Volume 9, Issue 3(2025). https://doi.org/10.1145/374949

  9. [9]

    Gary Nash. 2020. Response art in art therapy practice and research with a focus on reflect piece imagery.International Journal of Art Therapy25, 1 (2020)

  10. [10]

    Shannon Sie Santosa, Qian Wan, Junnan Yu, and Yuhan Luo. 2026. EmoFlow: From Tracking to Sense-Making of Emotions Through Creative Drawing. In Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems

  11. [11]

    Qwen Team. 2025. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631 (2025)

  12. [12]

    Lesley Uttley, Alison Scope, Matt Stevenson, Andrew Rawdin, Elizabeth Taylor Buck, Anthea Sutton, John Stevens, Eva Kaltenthaler, Kim Dent-Brown, and Chris Wood. 2015. Systematic review and economic modelling of the clinical effec- tiveness and cost-effectiveness of art therapy among people with non-psychotic mental health disorders.Health Technology Asse...

  13. [13]

    Asnat Weinfeld-Yehoudayan, Johanna Czamanski-Cohen, Miri Cohen, and Karen L Weihs. 2024. A theoretical model of emotional processing in visual artmaking and art therapy.The Arts in Psychotherapy90 (2024), 102196

  14. [14]

    Migyeong Yang, Chaehee Park, Jiwon Kang, Jiwon Kim, Taeeun Kim, Hayeon Song, and Jinyoung Han. 2025. PracticeDAPR: An AI-based Education-Supported System for Art Therapy.Proceedings of the ACM on Human-Computer Interaction 9, 2 (2025), 1–31

  15. [15]

    Chuyang Zhang, Bin Yu, Yuchao Wang, Mansi Yuan, Wanqi Wang, Seungwoo Je, and Pengcheng An. 2026. ASafePlace: User-Led Personalization of VR Relaxation via an Art Therapy Activity.arXiv preprint arXiv:2602.01579(2026)

  16. [16]

    Chanjin Zheng, Zengyi Yu, Yilin Jiang, Mingzi Zhang, Xunuo Lu, Jing Jin, and Liteng Gao. 2025. ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–18

  17. [17]

    Xi Zheng, Zhuoyang Li, Xinning Gui, and Yuhan Luo. 2025. Customizing emo- tional support: How do individuals construct and interact with LLM-powered chatbots. InProceedings of the 2025 CHI Conference on Human Factors in Comput- ing Systems. 1–20