MADRAG: Multi-Agent Debate with Retrieval-Augmented Generation for Training-Free Analytic Essay Scoring

Ali Keramati; Mark Warschauer; Sharad Mehrotra; Shiyuan Zhou

arxiv: 2606.06754 · v1 · pith:NMSB67DWnew · submitted 2026-06-04 · 💻 cs.MA · cs.CL

MADRAG: Multi-Agent Debate with Retrieval-Augmented Generation for Training-Free Analytic Essay Scoring

Ali Keramati , Shiyuan Zhou , Sharad Mehrotra , Mark Warschauer This is my paper

classification 💻 cs.MA cs.CL

keywords madragscoringanalyticcalibrationdebateessayevaluationjudge

0 comments

read the original abstract

We present MADRAG, a training-free framework for analytic essay scoring that combines multi-agent reasoning with retrieval-augmented grounding. Unlike standard LLM-as-judge approaches, which are prone to bias and unstable scoring, MADRAG decomposes evaluation into an interactive process: an Advocate identifies strengths, a Skeptic critiques weaknesses, and a Judge aggregates their arguments into a final score. Crucially, the Judge is augmented with rubric-aligned exemplar retrieval, enabling calibration through comparison with scored examples. Our results show that MADRAG significantly outperforms prompt-based baselines while approaching the performance of supervised systems without requiring task-specific training. Ablation studies demonstrate that retrieval drives calibration gains, while debate improves reasoning on higher-level traits. Our findings highlight the complementary roles of structured interaction and external memory in reliable LLM-based evaluation.

This paper has not been read by Pith yet.

MADRAG: Multi-Agent Debate with Retrieval-Augmented Generation for Training-Free Analytic Essay Scoring

discussion (0)