Verbalized confidence from small LMs enables cost-effective cascade routing for automated educational scoring, matching large-model accuracy at 76% lower cost when discrimination is strong.
Innovating As- sessment with Conversational Agents: A Technology- Enhanced Approach to Formative Assessments
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CY 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
LLM graders achieve substantial human agreement on math and science MCAS items but vary on ELA, performing best as sources of formative narrative feedback rather than summative numerical scores.
citing papers explorer
-
Do Small Language Models Know When They're Wrong? Confidence-Based Cascade Scoring for Educational Assessment
Verbalized confidence from small LMs enables cost-effective cascade routing for automated educational scoring, matching large-model accuracy at 76% lower cost when discrimination is strong.
-
Creating and Evaluating K-12 GenAI Assessment Graders Through Context Engineering
LLM graders achieve substantial human agreement on math and science MCAS items but vary on ELA, performing best as sources of formative narrative feedback rather than summative numerical scores.