AI short-answer scorers show mid-range quality degradation that lessens with more task-specific adaptation, while human agreement stays stable across the quality spectrum.
Do BERT -Like Bidirectional Models Still Perform Better on Text Classification in the Era of LLM s?
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Quality-Conditioned Agreement in Automated Short Answer Scoring: Mid-Range Degradation and the Impact of Task-Specific Adaptation
AI short-answer scorers show mid-range quality degradation that lessens with more task-specific adaptation, while human agreement stays stable across the quality spectrum.