Gemini-based LLMs achieved fair agreement with human experts on math competency rubrics while a larger Llama model showed no agreement, indicating that instruction-following architecture matters more than parameter count.
Higher Education Studies15(4), 333–353 (2025)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Human-in-the-Loop Benchmarking of Heterogeneous LLMs for Automated Competency Assessment in Secondary Level Mathematics
Gemini-based LLMs achieved fair agreement with human experts on math competency rubrics while a larger Llama model showed no agreement, indicating that instruction-following architecture matters more than parameter count.