arxiv: 2602.14183 · v2 · submitted 2026-02-15 · 💻 cs.HC

Recognition: no theorem link

Exploring a Multimodal Chatbot as a Facilitator in Therapeutic Art Activity

Le Lin , Zihao Zhu , Rainbow Tin Hung Ho , Jing Liao , Yuhan Luo

Authors on Pith no claims yet

Pith reviewed 2026-05-15 22:08 UTC · model grok-4.3

classification 💻 cs.HC

keywords multimodal chatbottherapeutic artMLLMart therapyreflective conversationvisual analysisAI facilitationcreative expression

0 comments

The pith

An MLLM-powered chatbot can support therapeutic art by analyzing visual creations in real time and holding reflective conversations with the maker.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a chatbot built on multimodal large language models that observes someone drawing or painting and responds with questions or observations meant to deepen self-reflection. The authors argue this combination of visual interpretation and dialogue can extend the reach of art therapy by offering immediate, interactive guidance during the creative act itself. They tested the concept through feedback from five experts in art therapy and related areas, who identified both strengths in engagement and specific design needs for safety and personalization. The work positions the chatbot as a new kind of facilitator rather than a replacement for human therapists.

Core claim

An MLLM-powered chatbot that processes real-time visual input from ongoing art-making together with conversational text can act as a facilitator in therapeutic art activities, according to qualitative evaluation with five experts who saw value in its capacity to promote reflective engagement while noting requirements for entry points, risk handling, style alignment, conversation balance, and richer visual features.

What carries the argument

The MLLM-powered chatbot that analyzes visual creation in real-time while engaging the creator in reflective conversations.

If this is right

Real-time visual analysis combined with dialogue can keep creators engaged in reflective thinking during the art process.
Design choices around user entry, safety boundaries, and matching therapy style will determine practical usefulness.
Conversations need to balance depth with breadth to avoid overwhelming or superficial exchanges.
Adding more ways for the system to reference and interact with the visual output can strengthen the experience.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Direct user studies with people receiving therapy could reveal whether the chatbot's interpretations match human perception of the same artwork.
The approach might extend to other creative modalities such as music or writing if the same real-time multimodal loop proves reliable.
Long-term use could generate data on how reflective patterns evolve across multiple sessions, informing adaptive therapy models.

Load-bearing premise

Expert opinions from five specialists are enough to show the chatbot can facilitate therapeutic engagement even without tests involving actual users or checks on how accurately the model reads the artwork.

What would settle it

A controlled session in which participants make art both with and without the chatbot and show no measurable difference in reported reflection, emotional insight, or sense of therapeutic value.

Figures

Figures reproduced from arXiv: 2602.14183 by Jing Liao, Le Lin, Rainbow Tin Hung Ho, Yuhan Luo, Zihao Zhu.

**Figure 1.** Figure 1: The overview of the AI-mediated therapeutic art activity platform: (A) Conversation Window where the user engages [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: An overview of the system architecture. The image understanding module takes the client’s drawing and the designed [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The artworks created by the expert participants using our system during the study. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

read the original abstract

Therapeutic art activities, such as expressive drawing and painting, require the synergy between creative visual production and interactive dialogue. Recent advancements in Multimodal Large Language Models (MLLMs) have expanded the capacity of computing systems to interpret both textual and visual data, offering a new frontier for AI-mediated therapeutic support. This work-in-progress paper introduces an MLLM-powered chatbot that analyzes visual creation in real-time while engaging the creator in reflective conversations. We conducted an evaluation with five experts in art therapy and related fields, which demonstrated the chatbot's potential to facilitate therapeutic engagement, and highlighted several areas for future development, including entryways and risk management, bespoke alignment of user profile and therapeutic style, balancing conversational depth and width, and enriching visual interactivity. These themes provide a design roadmap for designing the future AI-mediated creative expression tools.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a very preliminary sketch of an MLLM chatbot for art therapy whose n=5 expert feedback does not show real facilitation or accurate visual analysis.

read the letter

Hey, the main point on this paper is that it describes an early prototype of a multimodal LLM chatbot meant to watch someone draw or paint in real time and then hold a reflective conversation with them. The authors ran the idea past five experts in art therapy and pulled out a list of future design needs, but that is basically the extent of the evidence presented. What is actually new is the specific pairing of live visual interpretation with therapeutic-style dialogue in an art-making context. The abstract does not cite prior systems that do exactly this, so the combination counts as a fresh application area even if the underlying models are off-the-shelf. The paper also does a straightforward job naming practical problems that would matter in any real deployment, such as risk management, matching the chatbot to a user's therapeutic style, and keeping the conversation from getting either too shallow or too narrow. Those points are useful as a checklist for anyone else building similar tools. The soft spots are right where the stress test flags them. The claim that the system demonstrates potential to facilitate therapeutic engagement rests only on qualitative comments from five experts. There are no user sessions described, no logged interaction data, no accuracy numbers on how well the visual analysis works with actual drawings, and no quantitative measures at all. Without those, it is hard to know whether the chatbot would interpret the art correctly or produce dialogue that is genuinely helpful rather than generic. The authors themselves list these gaps as future work, which is honest but also means the current results stay at the level of a concept sketch. This paper is aimed at HCI researchers who are thinking about AI in creative or mental-health settings and who want a starting list of design issues. Someone already building prototypes might borrow the roadmap, but anyone looking for validated performance or reproducible findings will not find enough here. I would not send it to peer review yet. It needs at least some real-user testing and basic validation of the visual component before the central claim can be evaluated properly.

Referee Report

2 major / 2 minor

Summary. This work-in-progress paper introduces an MLLM-powered chatbot for therapeutic art activities that performs real-time visual analysis of user creations while conducting reflective conversations. The authors report an evaluation involving five experts in art therapy and related fields, claiming that the feedback demonstrates the chatbot's potential to facilitate therapeutic engagement, and they outline future development needs including entryways, risk management, bespoke alignment, conversational balance, and visual interactivity.

Significance. If the central claim holds after more rigorous validation, the work could provide a useful design roadmap for multimodal AI tools in creative therapy contexts within HCI. The exploratory nature and identification of specific future challenges (e.g., risk management and alignment) are constructive, but the current evidence base limits immediate impact.

major comments (2)

[Evaluation] Evaluation section: the claim that the n=5 expert evaluation 'demonstrated the chatbot's potential to facilitate therapeutic engagement' (abstract and main text) rests on qualitative feedback alone, with no reported quantitative measures, error bars, details on how visual analysis accuracy was assessed, logged interaction data, or actual user sessions in therapeutic contexts. This is load-bearing for the central claim and leaves the efficacy untested.
[Abstract and System Description] Abstract and System Description: the assertion of 'real-time' visual analysis and reflective conversation facilitation is presented without any metrics, examples of successful interpretations, or validation against ground-truth therapeutic outcomes, making it difficult to assess whether the MLLM component performs as described.

minor comments (2)

[Future Work] The future-work themes (entryways, risk management, etc.) are listed clearly but would benefit from one or two concrete examples or preliminary design sketches to strengthen the roadmap.
[System Description] Clarify the exact prompts or MLLM configuration used for visual analysis to improve reproducibility, even in a work-in-progress format.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work-in-progress paper. We address each major comment below, indicating where revisions will be made to clarify scope and strengthen the manuscript.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the claim that the n=5 expert evaluation 'demonstrated the chatbot's potential to facilitate therapeutic engagement' (abstract and main text) rests on qualitative feedback alone, with no reported quantitative measures, error bars, details on how visual analysis accuracy was assessed, logged interaction data, or actual user sessions in therapeutic contexts. This is load-bearing for the central claim and leaves the efficacy untested.

Authors: We agree that the evaluation is qualitative and exploratory, consistent with the work-in-progress nature of the paper. The n=5 expert sessions were designed to gather initial design insights through simulated interactions rather than to provide quantitative validation or efficacy testing. In the revision we will (1) add details on expert backgrounds, interview protocol, and how feedback was collected, (2) explicitly state that the results indicate potential but do not constitute a rigorous efficacy study, and (3) tone down the abstract and main-text claims accordingly. Quantitative measures, logged interaction data, and real therapeutic user sessions are outside the current scope and will be pursued in follow-up work. revision: partial
Referee: [Abstract and System Description] Abstract and System Description: the assertion of 'real-time' visual analysis and reflective conversation facilitation is presented without any metrics, examples of successful interpretations, or validation against ground-truth therapeutic outcomes, making it difficult to assess whether the MLLM component performs as described.

Authors: We will revise the abstract and system description to include concrete examples of visual analysis outputs and sample conversation excerpts drawn from the expert sessions. These additions will illustrate how the MLLM component operated in practice. We note that formal accuracy metrics and ground-truth therapeutic validation are not yet available for this initial prototype; the experts' qualitative judgments of interpretation quality will be reported in greater detail in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity detected in qualitative exploratory study

full rationale

The paper is a work-in-progress introduction of an MLLM-powered chatbot concept for therapeutic art activities, supported solely by qualitative feedback from five experts. No equations, parameters, derivations, predictions, or self-citations appear in the provided text or abstract. The evaluation is presented as preliminary design exploration without any fitted inputs renamed as outputs, self-definitional loops, or load-bearing uniqueness claims. All central statements rest on direct expert commentary rather than any reduction to prior inputs by construction, making the derivation chain self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the domain assumption that current multimodal LLMs can perform meaningful real-time interpretation of artistic visuals in a therapeutic context and that small-scale expert qualitative feedback provides valid initial validation for design exploration.

axioms (1)

domain assumption Multimodal LLMs can meaningfully interpret artistic visual content in a therapeutic context
Invoked when describing the chatbot's real-time analysis and reflective conversation capabilities.

pith-pipeline@v0.9.0 · 5445 in / 1184 out tokens · 38800 ms · 2026-05-15T22:08:20.877670+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

[1]

Sherry L Beaumont. 2012. Art therapy approaches for identity problems during adolescence.Canadian Art Therapy Association Journal25, 1 (2012)

work page 2012
[2]

Alan Briks. 2007. Art therapy with adolescents: Vehicle to engagement and transformation.Canadian Art Therapy Association Journal20, 1 (2007)

work page 2007
[3]

2008.Art as therapy: An introduction to the use of art as a therapeutic technique

Tessa Dalley. 2008.Art as therapy: An introduction to the use of art as a therapeutic technique. Routledge

work page 2008
[4]

Xuejun Du, Pengcheng An, Justin Leung, April Li, Linda E Chapman, and Jian Zhao. 2024. DeepThInk: Designing and probing human-AI co-creation in digital art therapy.International journal of human-computer studies181 (2024), 103139

work page 2024
[5]

Holly Feen-Calligan. 1996. Art therapy as a profession: Implications for the education and training of art therapists.Art Therapy13, 3 (1996)

work page 1996
[6]

Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. InProceedings of the SIGCHI conference on Human Factors in Computing Systems

work page 1999
[7]

Yi-Chieh Lee, Yichao Cui, Jack Jamieson, Wayne Fu, and Naomi Yamashita. 2023. Exploring effects of chatbot-based social contact on reducing mental illness stigma. InProceedings of the 2023 CHI conference on human factors in computing systems. 1–16

work page 2023
[8]

Di Liu, Jingwen Bai, Zhuoyi Zhang, Yilin Zhang, Zhenhao Zhang, Jian Zhao, and Pengcheng An. 2025. TherAIssist: Assisting Art Therapy Homework and Client-Practitioner Collaboration through Human-AI Interaction.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Volume 9, Issue 3(2025). https://doi.org/10.1145/374949

work page doi:10.1145/374949 2025
[9]

Gary Nash. 2020. Response art in art therapy practice and research with a focus on reflect piece imagery.International Journal of Art Therapy25, 1 (2020)

work page 2020
[10]

Shannon Sie Santosa, Qian Wan, Junnan Yu, and Yuhan Luo. 2026. EmoFlow: From Tracking to Sense-Making of Emotions Through Creative Drawing. In Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems

work page 2026
[11]

Qwen Team. 2025. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[12]

Lesley Uttley, Alison Scope, Matt Stevenson, Andrew Rawdin, Elizabeth Taylor Buck, Anthea Sutton, John Stevens, Eva Kaltenthaler, Kim Dent-Brown, and Chris Wood. 2015. Systematic review and economic modelling of the clinical effec- tiveness and cost-effectiveness of art therapy among people with non-psychotic mental health disorders.Health Technology Asse...

work page 2015
[13]

Asnat Weinfeld-Yehoudayan, Johanna Czamanski-Cohen, Miri Cohen, and Karen L Weihs. 2024. A theoretical model of emotional processing in visual artmaking and art therapy.The Arts in Psychotherapy90 (2024), 102196

work page 2024
[14]

Migyeong Yang, Chaehee Park, Jiwon Kang, Jiwon Kim, Taeeun Kim, Hayeon Song, and Jinyoung Han. 2025. PracticeDAPR: An AI-based Education-Supported System for Art Therapy.Proceedings of the ACM on Human-Computer Interaction 9, 2 (2025), 1–31

work page 2025
[15]

Chuyang Zhang, Bin Yu, Yuchao Wang, Mansi Yuan, Wanqi Wang, Seungwoo Je, and Pengcheng An. 2026. ASafePlace: User-Led Personalization of VR Relaxation via an Art Therapy Activity.arXiv preprint arXiv:2602.01579(2026)

work page arXiv 2026
[16]

Chanjin Zheng, Zengyi Yu, Yilin Jiang, Mingzi Zhang, Xunuo Lu, Jing Jin, and Liteng Gao. 2025. ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–18

work page 2025
[17]

Xi Zheng, Zhuoyang Li, Xinning Gui, and Yuhan Luo. 2025. Customizing emo- tional support: How do individuals construct and interact with LLM-powered chatbots. InProceedings of the 2025 CHI Conference on Human Factors in Comput- ing Systems. 1–20

work page 2025