Medical Education Online 30, 2550751

Evaluating large language models as graders of medical short answer questions: a comparative analysis with expert human graders · 2025 · arXiv 2981.2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

Automated grading of Linux/bash examinations using large language models: a four-level cognitive taxonomy approach

cs.AI · 2026-07-02 · unverdicted · novelty 6.0

Gemini 3.0 Pro with rubric prompts reached ICC 0.888 agreement with human graders on low-complexity Linux/bash responses but lower agreement at higher taxonomy levels across 1200 student answers from three expert raters.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Automated grading of Linux/bash examinations using large language models: a four-level cognitive taxonomy approach cs.AI · 2026-07-02 · unverdicted · none · ref 5
Gemini 3.0 Pro with rubric prompts reached ICC 0.888 agreement with human graders on low-complexity Linux/bash responses but lower agreement at higher taxonomy levels across 1200 student answers from three expert raters.

Medical Education Online 30, 2550751

fields

years

verdicts

representative citing papers

citing papers explorer