Tell Me Why: Designing an Explainable LLM-based Dialogue System for Student Problem Behavior Diagnosis

Deliang Wang; Penghe Chen; Yu Lu; Zhilin Fan

arxiv: 2604.22237 · v1 · submitted 2026-04-24 · 💻 cs.CL · cs.AI

Tell Me Why: Designing an Explainable LLM-based Dialogue System for Student Problem Behavior Diagnosis

Zhilin Fan , Deliang Wang , Penghe Chen , Yu Lu This is my paper

Pith reviewed 2026-05-08 11:55 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords explainable AILLM dialogue systemsstudent behavior diagnosiseducational technologyteacher trusthierarchical attributionintervention planning

0 comments

The pith

An LLM dialogue system with hierarchical explanations for student behavior diagnoses increases reported teacher trust by surfacing dialogue evidence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Teachers must pull together many details to identify student problem behaviors and choose interventions, yet black-box LLM suggestions offer little insight into their logic. This paper builds a fine-tuned LLM dialogue system and adds a hierarchical attribution method drawn from explainable AI to locate the specific conversation evidence behind each recommendation and turn that evidence into plain-language explanations. Technical tests show the method finds supporting evidence more accurately than baseline approaches. A preliminary study with 22 pre-service teachers found that participants given these explanations reported higher trust in the system than those who received only the recommendations.

Core claim

The authors demonstrate that a fine-tuned LLM-based dialogue system for diagnosing student problem behaviors can be augmented with a hierarchical attribution method to identify relevant dialogue evidence for each recommendation and generate natural-language explanations from that evidence. This produces stronger performance on evidence identification tasks than baseline methods and leads to measurably higher trust ratings from pre-service teachers in a small user study.

What carries the argument

Hierarchical attribution method, which traces each system recommendation back to specific parts of the dialogue to produce evidence-based natural-language explanations.

If this is right

The system can identify behavioral categories and suggest interventions while also showing teachers the exact dialogue turns that support those suggestions.
Teachers who receive the explanations report higher trust, which the authors link to greater potential use of the tool in practice.
The hierarchical method outperforms standard attribution baselines at recovering supporting evidence from multi-turn conversations.
Explanations are generated automatically from the model's outputs without requiring separate training for the explanation component.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar evidence-tracing techniques could be tested in other teacher-facing tools such as lesson planning or progress monitoring where transparency matters for adoption.
If the explanations prove faithful over time, they might serve as training examples that help new teachers internalize diagnostic patterns.
A follow-up study measuring whether higher trust actually changes the interventions teachers choose in live classroom scenarios would test the practical payoff.
The same hierarchical approach might address trust barriers when LLMs are used for other high-stakes professional decisions outside education.

Load-bearing premise

The hierarchical attribution method produces faithful accounts of the LLM's actual reasoning rather than plausible but invented justifications, and that higher reported trust will produce better real-world teaching decisions.

What would settle it

A study in which the attributions fail to match the model's internal attention patterns on the same inputs, or a larger trial showing no difference in teachers' actual diagnostic accuracy or intervention quality despite the added explanations.

Figures

Figures reproduced from arXiv: 2604.22237 by Deliang Wang, Penghe Chen, Yu Lu, Zhilin Fan.

**Figure 1.** Figure 1: Overview of the explainable diagnostic dialogue system. view at source ↗

**Figure 2.** Figure 2: System interface showing (1) the dialogue history, (2) recommended view at source ↗

read the original abstract

Diagnosing student problem behaviors requires teachers to synthesize multifaceted information, identify behavioral categories, and plan intervention strategies. Although fine-tuned large language models (LLMs) can support this process through multi-turn dialogue, they rarely explain why a strategy is recommended, limiting transparency and teachers' trust. To address this issue, we present an explainable dialogue system built on a fine-tuned LLM. The system uses a hierarchical attribution method based on explainable AI (xAI) to identify dialogue evidence for each recommendation and generate a natural-language explanation based on that evidence. In technical evaluation, the method outperformed baseline approaches in identifying supporting evidence. In a preliminary user study with 22 pre-service teachers, participants who received explanations reported higher trust in the system. These findings suggest a promising direction for improving LLM explainability in educational dialogue systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds a fine-tuned LLM dialogue system with hierarchical xAI explanations for student behavior diagnosis, but the evaluations stay preliminary and skip standard checks on explanation faithfulness or real decision impact.

read the letter

The core of this work is a practical system: fine-tune an LLM for multi-turn teacher chats on student problem behaviors, then layer on a hierarchical attribution method to pull out dialogue evidence and turn it into natural-language explanations for each recommendation. Technical tests claim it finds supporting evidence better than baselines, and a small user study with 22 pre-service teachers found higher self-reported trust when explanations appeared. That combination for this specific education task looks like the main new piece, even if the attribution technique itself draws from existing xAI ideas. The architecture is straightforward and ties explanations directly to the conversation history, which is a reasonable way to add transparency without overhauling the model. The user feedback gives at least an initial signal that teachers notice the difference. The soft spots sit in the evaluation details. The abstract reports outperformance and trust gains but supplies no numbers, statistical tests, baseline descriptions, or technical sample sizes. The user study is explicitly preliminary, measures only Likert trust, and does not test whether explanations improve diagnostic accuracy, intervention quality, or alignment with expert judgments. There are also no faithfulness checks, such as comparing attributions to model internals or asking humans to rate how well the explanation matches the actual reasoning rather than a plausible story. Those gaps mean the trust increase is hard to interpret as evidence of better transparency or usability in practice. This paper is aimed at researchers working on applied educational AI or xAI for dialogue systems. Someone looking for concrete examples of how to attach explanations to LLM outputs in a teacher-support setting could find usable ideas here. It is not yet strong enough to guide deployment or to cite as settled evidence. I would send it to peer review because it has a clear system and some empirical data, even though the current results need more rigorous validation on faithfulness and outcome measures before they can support the broader claims.

Referee Report

3 major / 0 minor

Summary. The manuscript presents an LLM-based multi-turn dialogue system for diagnosing student problem behaviors, augmented by a hierarchical attribution method drawn from xAI to extract dialogue evidence and generate natural-language explanations for each recommendation. It reports that the attribution method outperforms baseline approaches at identifying supporting evidence in technical evaluation, and that a preliminary user study with 22 pre-service teachers found higher self-reported trust when explanations were shown.

Significance. If the reported gains are shown to rest on faithful attributions and to translate into improved diagnostic accuracy or intervention quality, the work would offer a concrete, deployable approach to increasing transparency in educational dialogue systems. The combination of fine-tuned LLMs with hierarchical attribution is a timely application of xAI techniques to a domain where teacher trust is critical; the preliminary user-study evidence of trust gains is a useful starting point, though the absence of outcome measures limits immediate claims of practical impact.

major comments (3)

[Abstract] The technical evaluation asserts outperformance in identifying supporting evidence, yet supplies no quantitative metrics, statistical tests, baseline implementation details, or sample sizes (Abstract). These omissions make it impossible to evaluate whether the superiority claim is robust or merely suggestive.
[User Study] The user study (n=22) measures only Likert-scale trust and does not assess whether participants who received explanations produced more accurate behavior diagnoses or higher-quality intervention plans relative to an expert gold standard. This gap directly weakens the inference that higher trust improves real-world teaching decisions.
No standard faithfulness checks for the hierarchical attribution method—such as correlation of attribution scores with model internals (attention or gradients), ablation of evidence spans, or human ratings of explanation fidelity—are reported. Without these, it remains unclear whether the generated natural-language explanations reflect the LLM’s actual reasoning or constitute post-hoc rationalizations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below, indicating revisions where appropriate to strengthen the manuscript while maintaining its focus on a preliminary study.

read point-by-point responses

Referee: [Abstract] The technical evaluation asserts outperformance in identifying supporting evidence, yet supplies no quantitative metrics, statistical tests, baseline implementation details, or sample sizes (Abstract). These omissions make it impossible to evaluate whether the superiority claim is robust or merely suggestive.

Authors: We agree that the abstract would benefit from greater specificity. The full manuscript reports quantitative results (including precision, recall, and F1 for evidence identification), baseline details, sample sizes, and statistical comparisons in the technical evaluation section. We will revise the abstract to incorporate key metrics, statistical test outcomes, and implementation details to better substantiate the outperformance claim. revision: yes
Referee: [User Study] The user study (n=22) measures only Likert-scale trust and does not assess whether participants who received explanations produced more accurate behavior diagnoses or higher-quality intervention plans relative to an expert gold standard. This gap directly weakens the inference that higher trust improves real-world teaching decisions.

Authors: We acknowledge this limitation. The study is explicitly framed as preliminary and measures self-reported trust as an initial indicator of explanation utility. It does not include outcome measures such as diagnostic accuracy or intervention quality against a gold standard. We will revise the discussion and limitations sections to clearly state this scope, avoid overclaiming practical impact, and outline plans for future studies that incorporate expert-rated outcomes. revision: partial
Referee: [—] No standard faithfulness checks for the hierarchical attribution method—such as correlation of attribution scores with model internals (attention or gradients), ablation of evidence spans, or human ratings of explanation fidelity—are reported. Without these, it remains unclear whether the generated natural-language explanations reflect the LLM’s actual reasoning or constitute post-hoc rationalizations.

Authors: The technical evaluation provides evidence of the attribution method's utility through superior performance on evidence identification tasks, which serves as a task-aligned proxy for faithfulness. However, we did not include correlations with internal model signals, ablation studies, or dedicated human fidelity ratings. We will add a dedicated subsection discussing the method's grounding in xAI techniques, report any available human ratings of explanation quality from the user study, and explicitly note the absence of certain internal checks as a limitation with suggestions for future validation. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical claims rest on external evaluations and user study

full rationale

The paper describes construction of an LLM-based dialogue system augmented with a hierarchical attribution method for generating explanations. Its central claims (outperformance in identifying supporting evidence; higher trust in user study) are supported by technical comparisons against baselines and a preliminary study with 22 participants measuring self-reported trust. No equations, derivations, or first-principles arguments are present that reduce to self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. The evaluations are independent measurements against external baselines and human subjects rather than tautological restatements of the system's design choices.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim depends on standard assumptions about LLM fine-tuning and the validity of attribution methods in xAI; no free parameters or new entities are introduced in the abstract.

axioms (2)

domain assumption Fine-tuned LLMs can synthesize multifaceted information to diagnose student problem behaviors through multi-turn dialogue
Invoked as the foundation for the dialogue system
domain assumption Hierarchical attribution from xAI can reliably surface dialogue evidence that supports model recommendations
Core premise enabling the generation of natural-language explanations

pith-pipeline@v0.9.0 · 5441 in / 1341 out tokens · 52310 ms · 2026-05-08T11:55:10.538416+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

IEEE Transactions on Learning Technologies18, 1–15 (2025)

Chen, P., Fan, Z., Lu, Y.: Knowstu: Diagnosing students’ problem behaviors using fine-tuned llm and rag. IEEE Transactions on Learning Technologies18, 1–15 (2025)

work page 2025
[2]

In: International Conference on Artificial Intelligence in Education

Chen, P., Fan, Z., Lu, Y., Xu, Q.: Pbchat: Enhance student’s problem behavior diagnosis with large language model. In: International Conference on Artificial Intelligence in Education. pp. 32–45. Springer (2024)

work page 2024
[3]

In: Forty-second International Conference on Machine Learning

Chuang, Y.S., Cohen-Wang, B., Shen, Z., Wu, Z., Xu, H., Lin, X.V., Glass, J.R., Li, S.W., Yih, W.t.: Selfcite: Self-supervised alignment for context attribution in large language models. In: Forty-second International Conference on Machine Learning

work page
[4]

arXiv preprint arXiv:2409.00729 (2024)

Cohen-Wang, B., Shah, H., Georgiev, K., Madry, A.: Contextcite: Attributing model generation to context. arXiv preprint arXiv:2409.00729 (2024)

work page arXiv 2024
[5]

In: International Conference on Artificial Intelligence in Education

Fan, Z., Chen, P., Lu, Y.: Why did the ai suggest that? designing an explainable ed- ucational counseling system. In: International Conference on Artificial Intelligence in Education. pp. 321–335. Springer (2025)

work page 2025
[6]

International Journal of Artificial Intelli- gence in Education35(5), 2889–2922 (2025)

Feldman-Maggor, Y., Cukurova, M., Kent, C., Alexandron, G.: The impact of ex- plainable ai on teachers’ trust and acceptance of ai edtech recommendations: The power of domain-specific explanations. International Journal of Artificial Intelli- gence in Education35(5), 2889–2922 (2025)

work page 2025
[7]

Computers and Education: Artificial Intelligence3, 100074 (2022)

Khosravi, H., Shum, S.B., Chen, G., Conati, C., Tsai, Y.S., Kay, J., Knight, S., Martinez-Maldonado, R., Sadiq, S., Gašević, D.: Explainable artificial intelligence in education. Computers and Education: Artificial Intelligence3, 100074 (2022)

work page 2022
[8]

Human Factors: The Journal of the Human Factors and Ergonomics Society , author =

Merritt, S.M.: Affective processes in human–automation interactions. Human Fac- tors53(4), 356–370 (2011).https://doi.org/10.1177/0018720811411912

work page doi:10.1177/0018720811411912 2011
[9]

Qian, C., Wang, P., Liu, D., Yang, J., Guo, D., Tang, L., Mei, J., Ren, Q., Shao, S., Liu, Y., Fu, J., Shao, J., Hu, X.: The why behind the action: Unveiling internal drivers via agentic attribution (2026),https://arxiv.org/abs/2601.15075

work page arXiv 2026
[10]

Journal of Positive Behavior Interventions 22(4), 220–233 (2020)

Sutherland, K., Conroy, M., McLeod, B., Granger, K., Broda, M., Kunemund, R.: Preliminary study of the effects of best in class–elementary on outcomes of elemen- tary students with problem behavior. Journal of Positive Behavior Interventions 22(4), 220–233 (2020)

work page 2020
[11]

Advances in Neural Information Processing Systems36, 74952–74965 (2023)

Turpin, M., Michael, J., Perez, E., Bowman, S.: Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. Advances in Neural Information Processing Systems36, 74952–74965 (2023)

work page 2023
[12]

British Journal of Educational Technology55(6), 2530–2556 (2024)

Wang, D., Bian, C., Chen, G.: Using explainable ai to unravel classroom dialogue analysis: Effects of explanations on teachers’ trust, technology acceptance and cog- nitive load. British Journal of Educational Technology55(6), 2530–2556 (2024)

work page 2024
[13]

IEEE Transactions on Education 67(6), 907–918 (2024)

Wang, D., Chen, G.: Making ai accessible for stem teachers: Using explainable ai for unpacking classroom discourse analysis. IEEE Transactions on Education 67(6), 907–918 (2024)

work page 2024

[1] [1]

IEEE Transactions on Learning Technologies18, 1–15 (2025)

Chen, P., Fan, Z., Lu, Y.: Knowstu: Diagnosing students’ problem behaviors using fine-tuned llm and rag. IEEE Transactions on Learning Technologies18, 1–15 (2025)

work page 2025

[2] [2]

In: International Conference on Artificial Intelligence in Education

Chen, P., Fan, Z., Lu, Y., Xu, Q.: Pbchat: Enhance student’s problem behavior diagnosis with large language model. In: International Conference on Artificial Intelligence in Education. pp. 32–45. Springer (2024)

work page 2024

[3] [3]

In: Forty-second International Conference on Machine Learning

Chuang, Y.S., Cohen-Wang, B., Shen, Z., Wu, Z., Xu, H., Lin, X.V., Glass, J.R., Li, S.W., Yih, W.t.: Selfcite: Self-supervised alignment for context attribution in large language models. In: Forty-second International Conference on Machine Learning

work page

[4] [4]

arXiv preprint arXiv:2409.00729 (2024)

Cohen-Wang, B., Shah, H., Georgiev, K., Madry, A.: Contextcite: Attributing model generation to context. arXiv preprint arXiv:2409.00729 (2024)

work page arXiv 2024

[5] [5]

In: International Conference on Artificial Intelligence in Education

Fan, Z., Chen, P., Lu, Y.: Why did the ai suggest that? designing an explainable ed- ucational counseling system. In: International Conference on Artificial Intelligence in Education. pp. 321–335. Springer (2025)

work page 2025

[6] [6]

International Journal of Artificial Intelli- gence in Education35(5), 2889–2922 (2025)

Feldman-Maggor, Y., Cukurova, M., Kent, C., Alexandron, G.: The impact of ex- plainable ai on teachers’ trust and acceptance of ai edtech recommendations: The power of domain-specific explanations. International Journal of Artificial Intelli- gence in Education35(5), 2889–2922 (2025)

work page 2025

[7] [7]

Computers and Education: Artificial Intelligence3, 100074 (2022)

Khosravi, H., Shum, S.B., Chen, G., Conati, C., Tsai, Y.S., Kay, J., Knight, S., Martinez-Maldonado, R., Sadiq, S., Gašević, D.: Explainable artificial intelligence in education. Computers and Education: Artificial Intelligence3, 100074 (2022)

work page 2022

[8] [8]

Human Factors: The Journal of the Human Factors and Ergonomics Society , author =

Merritt, S.M.: Affective processes in human–automation interactions. Human Fac- tors53(4), 356–370 (2011).https://doi.org/10.1177/0018720811411912

work page doi:10.1177/0018720811411912 2011

[9] [9]

Qian, C., Wang, P., Liu, D., Yang, J., Guo, D., Tang, L., Mei, J., Ren, Q., Shao, S., Liu, Y., Fu, J., Shao, J., Hu, X.: The why behind the action: Unveiling internal drivers via agentic attribution (2026),https://arxiv.org/abs/2601.15075

work page arXiv 2026

[10] [10]

Journal of Positive Behavior Interventions 22(4), 220–233 (2020)

Sutherland, K., Conroy, M., McLeod, B., Granger, K., Broda, M., Kunemund, R.: Preliminary study of the effects of best in class–elementary on outcomes of elemen- tary students with problem behavior. Journal of Positive Behavior Interventions 22(4), 220–233 (2020)

work page 2020

[11] [11]

Advances in Neural Information Processing Systems36, 74952–74965 (2023)

Turpin, M., Michael, J., Perez, E., Bowman, S.: Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. Advances in Neural Information Processing Systems36, 74952–74965 (2023)

work page 2023

[12] [12]

British Journal of Educational Technology55(6), 2530–2556 (2024)

Wang, D., Bian, C., Chen, G.: Using explainable ai to unravel classroom dialogue analysis: Effects of explanations on teachers’ trust, technology acceptance and cog- nitive load. British Journal of Educational Technology55(6), 2530–2556 (2024)

work page 2024

[13] [13]

IEEE Transactions on Education 67(6), 907–918 (2024)

Wang, D., Chen, G.: Making ai accessible for stem teachers: Using explainable ai for unpacking classroom discourse analysis. IEEE Transactions on Education 67(6), 907–918 (2024)

work page 2024