Gemini-based LLMs achieved fair agreement with human experts on math competency rubrics while a larger Llama model showed no agreement, indicating that instruction-following architecture matters more than parameter count.
Computers in Human Behavior Reports14, 100412 (2024)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Human-in-the-Loop Benchmarking of Heterogeneous LLMs for Automated Competency Assessment in Secondary Level Mathematics
Gemini-based LLMs achieved fair agreement with human experts on math competency rubrics while a larger Llama model showed no agreement, indicating that instruction-following architecture matters more than parameter count.