Fine-tuned GPT-4o reaches state-of-the-art on grammatical error correction while reference-based metrics underestimate performance by missing 73.76 percent of valid or superior outputs.
In: Proceedings of the NLP2025 Workshop
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Multi-Dimensional Evaluation of LLMs for Grammatical Error Correction
Fine-tuned GPT-4o reaches state-of-the-art on grammatical error correction while reference-based metrics underestimate performance by missing 73.76 percent of valid or superior outputs.