A Framework for Human-AI Q-Matrix Refinement: A NeuralCDM Evaluation

arxiv: 2604.16398 · v1 · submitted 2026-03-30 · 💻 cs.CY · cs.AI

A Framework for Human-AI Q-Matrix Refinement: A NeuralCDM Evaluation

Ying Zhang , Ningxi Cheng , Yizhu Gao , Hongmei Li , Lehong Shi , Nicholas Young , Geng Yuan , Xiaoming Zhai This is my paper

classification 💻 cs.CY cs.AI

keywords q-matricesframeworkmodelsassessmentdeployedevaluationhuman-aillms

0 comments p. Extension

read the original abstract

Q-matrices are a cornerstone of theory-driven assessment and learning analytics, making item demands and students' underlying knowledge components and misconceptions explicit and actionable. However, Q-matrices are typically crafted by experts, making them time-consuming to build, prone to subjectivity, and difficult to validate empirically. We propose a framework for human-AI Q-matrix refinement in which large language models (LLMs) generate candidate Q-matrices using structured, misconception-aware prompting, and NeuralCDM provides an empirical evaluation layer to compare candidates based on how well they explain student response data. We apply the framework to a thermodynamics assessment dataset and benchmark locally deployed LLMs against cloud-served models. Results show that iteratively refined LLM-generated Q-matrices can exceed expert-baseline model fit (AUC 0.780 vs. 0.717), and that locally deployed models achieve comparable performance to cloud APIs, supporting privacy-preserving deployment.

This paper has not been read by Pith yet.

A Framework for Human-AI Q-Matrix Refinement: A NeuralCDM Evaluation

discussion (0)