Tell Me Why: Designing an Explainable LLM-based Dialogue System for Student Problem Behavior Diagnosis
Pith reviewed 2026-05-08 11:55 UTC · model grok-4.3
The pith
An LLM dialogue system with hierarchical explanations for student behavior diagnoses increases reported teacher trust by surfacing dialogue evidence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors demonstrate that a fine-tuned LLM-based dialogue system for diagnosing student problem behaviors can be augmented with a hierarchical attribution method to identify relevant dialogue evidence for each recommendation and generate natural-language explanations from that evidence. This produces stronger performance on evidence identification tasks than baseline methods and leads to measurably higher trust ratings from pre-service teachers in a small user study.
What carries the argument
Hierarchical attribution method, which traces each system recommendation back to specific parts of the dialogue to produce evidence-based natural-language explanations.
If this is right
- The system can identify behavioral categories and suggest interventions while also showing teachers the exact dialogue turns that support those suggestions.
- Teachers who receive the explanations report higher trust, which the authors link to greater potential use of the tool in practice.
- The hierarchical method outperforms standard attribution baselines at recovering supporting evidence from multi-turn conversations.
- Explanations are generated automatically from the model's outputs without requiring separate training for the explanation component.
Where Pith is reading between the lines
- Similar evidence-tracing techniques could be tested in other teacher-facing tools such as lesson planning or progress monitoring where transparency matters for adoption.
- If the explanations prove faithful over time, they might serve as training examples that help new teachers internalize diagnostic patterns.
- A follow-up study measuring whether higher trust actually changes the interventions teachers choose in live classroom scenarios would test the practical payoff.
- The same hierarchical approach might address trust barriers when LLMs are used for other high-stakes professional decisions outside education.
Load-bearing premise
The hierarchical attribution method produces faithful accounts of the LLM's actual reasoning rather than plausible but invented justifications, and that higher reported trust will produce better real-world teaching decisions.
What would settle it
A study in which the attributions fail to match the model's internal attention patterns on the same inputs, or a larger trial showing no difference in teachers' actual diagnostic accuracy or intervention quality despite the added explanations.
Figures
read the original abstract
Diagnosing student problem behaviors requires teachers to synthesize multifaceted information, identify behavioral categories, and plan intervention strategies. Although fine-tuned large language models (LLMs) can support this process through multi-turn dialogue, they rarely explain why a strategy is recommended, limiting transparency and teachers' trust. To address this issue, we present an explainable dialogue system built on a fine-tuned LLM. The system uses a hierarchical attribution method based on explainable AI (xAI) to identify dialogue evidence for each recommendation and generate a natural-language explanation based on that evidence. In technical evaluation, the method outperformed baseline approaches in identifying supporting evidence. In a preliminary user study with 22 pre-service teachers, participants who received explanations reported higher trust in the system. These findings suggest a promising direction for improving LLM explainability in educational dialogue systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an LLM-based multi-turn dialogue system for diagnosing student problem behaviors, augmented by a hierarchical attribution method drawn from xAI to extract dialogue evidence and generate natural-language explanations for each recommendation. It reports that the attribution method outperforms baseline approaches at identifying supporting evidence in technical evaluation, and that a preliminary user study with 22 pre-service teachers found higher self-reported trust when explanations were shown.
Significance. If the reported gains are shown to rest on faithful attributions and to translate into improved diagnostic accuracy or intervention quality, the work would offer a concrete, deployable approach to increasing transparency in educational dialogue systems. The combination of fine-tuned LLMs with hierarchical attribution is a timely application of xAI techniques to a domain where teacher trust is critical; the preliminary user-study evidence of trust gains is a useful starting point, though the absence of outcome measures limits immediate claims of practical impact.
major comments (3)
- [Abstract] The technical evaluation asserts outperformance in identifying supporting evidence, yet supplies no quantitative metrics, statistical tests, baseline implementation details, or sample sizes (Abstract). These omissions make it impossible to evaluate whether the superiority claim is robust or merely suggestive.
- [User Study] The user study (n=22) measures only Likert-scale trust and does not assess whether participants who received explanations produced more accurate behavior diagnoses or higher-quality intervention plans relative to an expert gold standard. This gap directly weakens the inference that higher trust improves real-world teaching decisions.
- No standard faithfulness checks for the hierarchical attribution method—such as correlation of attribution scores with model internals (attention or gradients), ablation of evidence spans, or human ratings of explanation fidelity—are reported. Without these, it remains unclear whether the generated natural-language explanations reflect the LLM’s actual reasoning or constitute post-hoc rationalizations.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major point below, indicating revisions where appropriate to strengthen the manuscript while maintaining its focus on a preliminary study.
read point-by-point responses
-
Referee: [Abstract] The technical evaluation asserts outperformance in identifying supporting evidence, yet supplies no quantitative metrics, statistical tests, baseline implementation details, or sample sizes (Abstract). These omissions make it impossible to evaluate whether the superiority claim is robust or merely suggestive.
Authors: We agree that the abstract would benefit from greater specificity. The full manuscript reports quantitative results (including precision, recall, and F1 for evidence identification), baseline details, sample sizes, and statistical comparisons in the technical evaluation section. We will revise the abstract to incorporate key metrics, statistical test outcomes, and implementation details to better substantiate the outperformance claim. revision: yes
-
Referee: [User Study] The user study (n=22) measures only Likert-scale trust and does not assess whether participants who received explanations produced more accurate behavior diagnoses or higher-quality intervention plans relative to an expert gold standard. This gap directly weakens the inference that higher trust improves real-world teaching decisions.
Authors: We acknowledge this limitation. The study is explicitly framed as preliminary and measures self-reported trust as an initial indicator of explanation utility. It does not include outcome measures such as diagnostic accuracy or intervention quality against a gold standard. We will revise the discussion and limitations sections to clearly state this scope, avoid overclaiming practical impact, and outline plans for future studies that incorporate expert-rated outcomes. revision: partial
-
Referee: [—] No standard faithfulness checks for the hierarchical attribution method—such as correlation of attribution scores with model internals (attention or gradients), ablation of evidence spans, or human ratings of explanation fidelity—are reported. Without these, it remains unclear whether the generated natural-language explanations reflect the LLM’s actual reasoning or constitute post-hoc rationalizations.
Authors: The technical evaluation provides evidence of the attribution method's utility through superior performance on evidence identification tasks, which serves as a task-aligned proxy for faithfulness. However, we did not include correlations with internal model signals, ablation studies, or dedicated human fidelity ratings. We will add a dedicated subsection discussing the method's grounding in xAI techniques, report any available human ratings of explanation quality from the user study, and explicitly note the absence of certain internal checks as a limitation with suggestions for future validation. revision: partial
Circularity Check
No circularity: empirical claims rest on external evaluations and user study
full rationale
The paper describes construction of an LLM-based dialogue system augmented with a hierarchical attribution method for generating explanations. Its central claims (outperformance in identifying supporting evidence; higher trust in user study) are supported by technical comparisons against baselines and a preliminary study with 22 participants measuring self-reported trust. No equations, derivations, or first-principles arguments are present that reduce to self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. The evaluations are independent measurements against external baselines and human subjects rather than tautological restatements of the system's design choices.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Fine-tuned LLMs can synthesize multifaceted information to diagnose student problem behaviors through multi-turn dialogue
- domain assumption Hierarchical attribution from xAI can reliably surface dialogue evidence that supports model recommendations
Reference graph
Works this paper leans on
-
[1]
IEEE Transactions on Learning Technologies18, 1–15 (2025)
Chen, P., Fan, Z., Lu, Y.: Knowstu: Diagnosing students’ problem behaviors using fine-tuned llm and rag. IEEE Transactions on Learning Technologies18, 1–15 (2025)
work page 2025
-
[2]
In: International Conference on Artificial Intelligence in Education
Chen, P., Fan, Z., Lu, Y., Xu, Q.: Pbchat: Enhance student’s problem behavior diagnosis with large language model. In: International Conference on Artificial Intelligence in Education. pp. 32–45. Springer (2024)
work page 2024
-
[3]
In: Forty-second International Conference on Machine Learning
Chuang, Y.S., Cohen-Wang, B., Shen, Z., Wu, Z., Xu, H., Lin, X.V., Glass, J.R., Li, S.W., Yih, W.t.: Selfcite: Self-supervised alignment for context attribution in large language models. In: Forty-second International Conference on Machine Learning
-
[4]
arXiv preprint arXiv:2409.00729 (2024)
Cohen-Wang, B., Shah, H., Georgiev, K., Madry, A.: Contextcite: Attributing model generation to context. arXiv preprint arXiv:2409.00729 (2024)
-
[5]
In: International Conference on Artificial Intelligence in Education
Fan, Z., Chen, P., Lu, Y.: Why did the ai suggest that? designing an explainable ed- ucational counseling system. In: International Conference on Artificial Intelligence in Education. pp. 321–335. Springer (2025)
work page 2025
-
[6]
International Journal of Artificial Intelli- gence in Education35(5), 2889–2922 (2025)
Feldman-Maggor, Y., Cukurova, M., Kent, C., Alexandron, G.: The impact of ex- plainable ai on teachers’ trust and acceptance of ai edtech recommendations: The power of domain-specific explanations. International Journal of Artificial Intelli- gence in Education35(5), 2889–2922 (2025)
work page 2025
-
[7]
Computers and Education: Artificial Intelligence3, 100074 (2022)
Khosravi, H., Shum, S.B., Chen, G., Conati, C., Tsai, Y.S., Kay, J., Knight, S., Martinez-Maldonado, R., Sadiq, S., Gašević, D.: Explainable artificial intelligence in education. Computers and Education: Artificial Intelligence3, 100074 (2022)
work page 2022
-
[8]
Human Factors: The Journal of the Human Factors and Ergonomics Society , author =
Merritt, S.M.: Affective processes in human–automation interactions. Human Fac- tors53(4), 356–370 (2011).https://doi.org/10.1177/0018720811411912
- [9]
-
[10]
Journal of Positive Behavior Interventions 22(4), 220–233 (2020)
Sutherland, K., Conroy, M., McLeod, B., Granger, K., Broda, M., Kunemund, R.: Preliminary study of the effects of best in class–elementary on outcomes of elemen- tary students with problem behavior. Journal of Positive Behavior Interventions 22(4), 220–233 (2020)
work page 2020
-
[11]
Advances in Neural Information Processing Systems36, 74952–74965 (2023)
Turpin, M., Michael, J., Perez, E., Bowman, S.: Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. Advances in Neural Information Processing Systems36, 74952–74965 (2023)
work page 2023
-
[12]
British Journal of Educational Technology55(6), 2530–2556 (2024)
Wang, D., Bian, C., Chen, G.: Using explainable ai to unravel classroom dialogue analysis: Effects of explanations on teachers’ trust, technology acceptance and cog- nitive load. British Journal of Educational Technology55(6), 2530–2556 (2024)
work page 2024
-
[13]
IEEE Transactions on Education 67(6), 907–918 (2024)
Wang, D., Chen, G.: Making ai accessible for stem teachers: Using explainable ai for unpacking classroom discourse analysis. IEEE Transactions on Education 67(6), 907–918 (2024)
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.