LLMs outperform single human raters at spotting relative weaknesses in L2 writing profiles on the ICNALE GRA dataset while humans are better at spotting strengths, using a self-referential intra-learner evaluation method.
Prompt- and Trait Relation-aware Cross-prompt Essay Trait Scoring
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
LLM representations encode essay quality in a linearly decodable form that emerges across layers and includes identifiable scoring neurons whose distribution shifts with essay length.
MADRAG combines multi-agent debate with retrieval-augmented generation to produce training-free analytic essay scores that outperform prompt baselines and approach supervised systems.
MAPLE uses meta-learning with prototypical networks to learn transferable representations and achieves state-of-the-art cross-prompt essay scoring on ELLIPSE, LAILA, and parts of ASAP datasets.
citing papers explorer
-
Towards Self-Referential Analytic Assessment: A Profile-Based Approach to L2 Writing Evaluation with LLMs
LLMs outperform single human raters at spotting relative weaknesses in L2 writing profiles on the ICNALE GRA dataset while humans are better at spotting strengths, using a self-referential intra-learner evaluation method.
-
From Texts to Scores: Tracing the Emergence of Essay Quality Representations in Large Language Models
LLM representations encode essay quality in a linearly decodable form that emerges across layers and includes identifiable scoring neurons whose distribution shifts with essay length.
-
MADRAG: Multi-Agent Debate with Retrieval-Augmented Generation for Training-Free Analytic Essay Scoring
MADRAG combines multi-agent debate with retrieval-augmented generation to produce training-free analytic essay scores that outperform prompt baselines and approach supervised systems.
-
MAPLE: A Meta-learning Framework for Cross-Prompt Essay Scoring
MAPLE uses meta-learning with prototypical networks to learn transferable representations and achieves state-of-the-art cross-prompt essay scoring on ELLIPSE, LAILA, and parts of ASAP datasets.