An iterative framework lets LLMs learn procedural assessment skills for rubric construction, improving automated scoring on all ten ASAP-SAS items and often exceeding expert rubrics while showing cross-item transfer.
A llm-powered automatic grading framework with human-level guidelines optimization,
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
LLM-based automatic grading systems are highly vulnerable to prompt injection attacks that force high scores regardless of answer quality, and existing defenses fail to mitigate them.
Liberal partial-credit prompting reduces question-level grading error for all six tested LLMs, with ChatGPT 5.5 Thinking (LIBERAL) achieving the lowest MAE of 1.87.
LLM graders achieve substantial human agreement on math and science MCAS items but vary on ELA, performing best as sources of formative narrative feedback rather than summative numerical scores.
citing papers explorer
-
Learnable Assessment Skills for LLM-based Automated Scoring: Rubric Construction via Iterative Optimization
An iterative framework lets LLMs learn procedural assessment skills for rubric construction, improving automated scoring on all ten ASAP-SAS items and often exceeding expert rubrics while showing cross-item transfer.
-
"**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems
LLM-based automatic grading systems are highly vulnerable to prompt injection attacks that force high scores regardless of answer quality, and existing defenses fail to mitigate them.
-
LLMs as Teaching Assistants for Mathematics Exam Grading: Reliability, and Practical Usability
Liberal partial-credit prompting reduces question-level grading error for all six tested LLMs, with ChatGPT 5.5 Thinking (LIBERAL) achieving the lowest MAE of 1.87.
-
Creating and Evaluating K-12 GenAI Assessment Graders Through Context Engineering
LLM graders achieve substantial human agreement on math and science MCAS items but vary on ELA, performing best as sources of formative narrative feedback rather than summative numerical scores.