AI models for automated short answer scoring show substantial mid-range quality degradation in expert agreement that improves with greater task-specific adaptation.
and Xi, Xiaoming and Breyer, F
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
A hybrid text-plus-audio classifier framework is introduced to identify potentially troubling student responses by analyzing both what is said and how it is said.
LLM graders achieve substantial human agreement on math and science MCAS items but vary on ELA, performing best as sources of formative narrative feedback rather than summative numerical scores.
citing papers explorer
-
Quality-Conditioned Agreement in Automated Short Answer Scoring: Mid-Range Degradation and the Impact of Task-Specific Adaptation
AI models for automated short answer scoring show substantial mid-range quality degradation in expert agreement that improves with greater task-specific adaptation.
-
Detecting Alarming Student Verbal Responses using Text and Audio Classifier
A hybrid text-plus-audio classifier framework is introduced to identify potentially troubling student responses by analyzing both what is said and how it is said.
-
Creating and Evaluating K-12 GenAI Assessment Graders Through Context Engineering
LLM graders achieve substantial human agreement on math and science MCAS items but vary on ELA, performing best as sources of formative narrative feedback rather than summative numerical scores.