Recognition: no theorem link
Exploring a Multimodal Chatbot as a Facilitator in Therapeutic Art Activity
Pith reviewed 2026-05-15 22:08 UTC · model grok-4.3
The pith
An MLLM-powered chatbot can support therapeutic art by analyzing visual creations in real time and holding reflective conversations with the maker.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
An MLLM-powered chatbot that processes real-time visual input from ongoing art-making together with conversational text can act as a facilitator in therapeutic art activities, according to qualitative evaluation with five experts who saw value in its capacity to promote reflective engagement while noting requirements for entry points, risk handling, style alignment, conversation balance, and richer visual features.
What carries the argument
The MLLM-powered chatbot that analyzes visual creation in real-time while engaging the creator in reflective conversations.
If this is right
- Real-time visual analysis combined with dialogue can keep creators engaged in reflective thinking during the art process.
- Design choices around user entry, safety boundaries, and matching therapy style will determine practical usefulness.
- Conversations need to balance depth with breadth to avoid overwhelming or superficial exchanges.
- Adding more ways for the system to reference and interact with the visual output can strengthen the experience.
Where Pith is reading between the lines
- Direct user studies with people receiving therapy could reveal whether the chatbot's interpretations match human perception of the same artwork.
- The approach might extend to other creative modalities such as music or writing if the same real-time multimodal loop proves reliable.
- Long-term use could generate data on how reflective patterns evolve across multiple sessions, informing adaptive therapy models.
Load-bearing premise
Expert opinions from five specialists are enough to show the chatbot can facilitate therapeutic engagement even without tests involving actual users or checks on how accurately the model reads the artwork.
What would settle it
A controlled session in which participants make art both with and without the chatbot and show no measurable difference in reported reflection, emotional insight, or sense of therapeutic value.
Figures
read the original abstract
Therapeutic art activities, such as expressive drawing and painting, require the synergy between creative visual production and interactive dialogue. Recent advancements in Multimodal Large Language Models (MLLMs) have expanded the capacity of computing systems to interpret both textual and visual data, offering a new frontier for AI-mediated therapeutic support. This work-in-progress paper introduces an MLLM-powered chatbot that analyzes visual creation in real-time while engaging the creator in reflective conversations. We conducted an evaluation with five experts in art therapy and related fields, which demonstrated the chatbot's potential to facilitate therapeutic engagement, and highlighted several areas for future development, including entryways and risk management, bespoke alignment of user profile and therapeutic style, balancing conversational depth and width, and enriching visual interactivity. These themes provide a design roadmap for designing the future AI-mediated creative expression tools.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This work-in-progress paper introduces an MLLM-powered chatbot for therapeutic art activities that performs real-time visual analysis of user creations while conducting reflective conversations. The authors report an evaluation involving five experts in art therapy and related fields, claiming that the feedback demonstrates the chatbot's potential to facilitate therapeutic engagement, and they outline future development needs including entryways, risk management, bespoke alignment, conversational balance, and visual interactivity.
Significance. If the central claim holds after more rigorous validation, the work could provide a useful design roadmap for multimodal AI tools in creative therapy contexts within HCI. The exploratory nature and identification of specific future challenges (e.g., risk management and alignment) are constructive, but the current evidence base limits immediate impact.
major comments (2)
- [Evaluation] Evaluation section: the claim that the n=5 expert evaluation 'demonstrated the chatbot's potential to facilitate therapeutic engagement' (abstract and main text) rests on qualitative feedback alone, with no reported quantitative measures, error bars, details on how visual analysis accuracy was assessed, logged interaction data, or actual user sessions in therapeutic contexts. This is load-bearing for the central claim and leaves the efficacy untested.
- [Abstract and System Description] Abstract and System Description: the assertion of 'real-time' visual analysis and reflective conversation facilitation is presented without any metrics, examples of successful interpretations, or validation against ground-truth therapeutic outcomes, making it difficult to assess whether the MLLM component performs as described.
minor comments (2)
- [Future Work] The future-work themes (entryways, risk management, etc.) are listed clearly but would benefit from one or two concrete examples or preliminary design sketches to strengthen the roadmap.
- [System Description] Clarify the exact prompts or MLLM configuration used for visual analysis to improve reproducibility, even in a work-in-progress format.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our work-in-progress paper. We address each major comment below, indicating where revisions will be made to clarify scope and strengthen the manuscript.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: the claim that the n=5 expert evaluation 'demonstrated the chatbot's potential to facilitate therapeutic engagement' (abstract and main text) rests on qualitative feedback alone, with no reported quantitative measures, error bars, details on how visual analysis accuracy was assessed, logged interaction data, or actual user sessions in therapeutic contexts. This is load-bearing for the central claim and leaves the efficacy untested.
Authors: We agree that the evaluation is qualitative and exploratory, consistent with the work-in-progress nature of the paper. The n=5 expert sessions were designed to gather initial design insights through simulated interactions rather than to provide quantitative validation or efficacy testing. In the revision we will (1) add details on expert backgrounds, interview protocol, and how feedback was collected, (2) explicitly state that the results indicate potential but do not constitute a rigorous efficacy study, and (3) tone down the abstract and main-text claims accordingly. Quantitative measures, logged interaction data, and real therapeutic user sessions are outside the current scope and will be pursued in follow-up work. revision: partial
-
Referee: [Abstract and System Description] Abstract and System Description: the assertion of 'real-time' visual analysis and reflective conversation facilitation is presented without any metrics, examples of successful interpretations, or validation against ground-truth therapeutic outcomes, making it difficult to assess whether the MLLM component performs as described.
Authors: We will revise the abstract and system description to include concrete examples of visual analysis outputs and sample conversation excerpts drawn from the expert sessions. These additions will illustrate how the MLLM component operated in practice. We note that formal accuracy metrics and ground-truth therapeutic validation are not yet available for this initial prototype; the experts' qualitative judgments of interpretation quality will be reported in greater detail in the revised manuscript. revision: yes
Circularity Check
No circularity detected in qualitative exploratory study
full rationale
The paper is a work-in-progress introduction of an MLLM-powered chatbot concept for therapeutic art activities, supported solely by qualitative feedback from five experts. No equations, parameters, derivations, predictions, or self-citations appear in the provided text or abstract. The evaluation is presented as preliminary design exploration without any fitted inputs renamed as outputs, self-definitional loops, or load-bearing uniqueness claims. All central statements rest on direct expert commentary rather than any reduction to prior inputs by construction, making the derivation chain self-contained and non-circular.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Multimodal LLMs can meaningfully interpret artistic visual content in a therapeutic context
Reference graph
Works this paper leans on
-
[1]
Sherry L Beaumont. 2012. Art therapy approaches for identity problems during adolescence.Canadian Art Therapy Association Journal25, 1 (2012)
work page 2012
-
[2]
Alan Briks. 2007. Art therapy with adolescents: Vehicle to engagement and transformation.Canadian Art Therapy Association Journal20, 1 (2007)
work page 2007
-
[3]
2008.Art as therapy: An introduction to the use of art as a therapeutic technique
Tessa Dalley. 2008.Art as therapy: An introduction to the use of art as a therapeutic technique. Routledge
work page 2008
-
[4]
Xuejun Du, Pengcheng An, Justin Leung, April Li, Linda E Chapman, and Jian Zhao. 2024. DeepThInk: Designing and probing human-AI co-creation in digital art therapy.International journal of human-computer studies181 (2024), 103139
work page 2024
-
[5]
Holly Feen-Calligan. 1996. Art therapy as a profession: Implications for the education and training of art therapists.Art Therapy13, 3 (1996)
work page 1996
-
[6]
Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. InProceedings of the SIGCHI conference on Human Factors in Computing Systems
work page 1999
-
[7]
Yi-Chieh Lee, Yichao Cui, Jack Jamieson, Wayne Fu, and Naomi Yamashita. 2023. Exploring effects of chatbot-based social contact on reducing mental illness stigma. InProceedings of the 2023 CHI conference on human factors in computing systems. 1–16
work page 2023
-
[8]
Di Liu, Jingwen Bai, Zhuoyi Zhang, Yilin Zhang, Zhenhao Zhang, Jian Zhao, and Pengcheng An. 2025. TherAIssist: Assisting Art Therapy Homework and Client-Practitioner Collaboration through Human-AI Interaction.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Volume 9, Issue 3(2025). https://doi.org/10.1145/374949
-
[9]
Gary Nash. 2020. Response art in art therapy practice and research with a focus on reflect piece imagery.International Journal of Art Therapy25, 1 (2020)
work page 2020
-
[10]
Shannon Sie Santosa, Qian Wan, Junnan Yu, and Yuhan Luo. 2026. EmoFlow: From Tracking to Sense-Making of Emotions Through Creative Drawing. In Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems
work page 2026
-
[11]
Qwen Team. 2025. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[12]
Lesley Uttley, Alison Scope, Matt Stevenson, Andrew Rawdin, Elizabeth Taylor Buck, Anthea Sutton, John Stevens, Eva Kaltenthaler, Kim Dent-Brown, and Chris Wood. 2015. Systematic review and economic modelling of the clinical effec- tiveness and cost-effectiveness of art therapy among people with non-psychotic mental health disorders.Health Technology Asse...
work page 2015
-
[13]
Asnat Weinfeld-Yehoudayan, Johanna Czamanski-Cohen, Miri Cohen, and Karen L Weihs. 2024. A theoretical model of emotional processing in visual artmaking and art therapy.The Arts in Psychotherapy90 (2024), 102196
work page 2024
-
[14]
Migyeong Yang, Chaehee Park, Jiwon Kang, Jiwon Kim, Taeeun Kim, Hayeon Song, and Jinyoung Han. 2025. PracticeDAPR: An AI-based Education-Supported System for Art Therapy.Proceedings of the ACM on Human-Computer Interaction 9, 2 (2025), 1–31
work page 2025
- [15]
-
[16]
Chanjin Zheng, Zengyi Yu, Yilin Jiang, Mingzi Zhang, Xunuo Lu, Jing Jin, and Liteng Gao. 2025. ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–18
work page 2025
-
[17]
Xi Zheng, Zhuoyang Li, Xinning Gui, and Yuhan Luo. 2025. Customizing emo- tional support: How do individuals construct and interact with LLM-powered chatbots. InProceedings of the 2025 CHI Conference on Human Factors in Comput- ing Systems. 1–20
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.