arXiv preprint arXiv:1605.02592 (2016), https://arxiv.org/abs/1605.02592

Napoles, C · 2016 · cs.CL · arXiv 1605.02592

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

The GLEU metric was proposed for evaluating grammatical error corrections using n-gram overlap with a set of reference sentences, as opposed to precision/recall of specific annotated errors (Napoles et al., 2015). This paper describes improvements made to the GLEU metric that address problems that arise when using an increasing number of reference sets. Unlike the originally presented metric, the modified metric does not require tuning. We recommend that this version be used instead of the original version.

representative citing papers

Multi-Dimensional Evaluation of LLMs for Grammatical Error Correction

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

Fine-tuned GPT-4o reaches state-of-the-art on grammatical error correction while reference-based metrics underestimate performance by missing 73.76 percent of valid or superior outputs.

Edit-level Majority Voting Mitigates Over-Correction in LLM-based Grammatical Error Correction

cs.CL · 2026-05-13 · unverdicted · novelty 5.0

Edit-level majority voting on multiple LLM-generated candidates reduces over-correction in grammatical error correction and outperforms greedy and MBR decoding on nine multilingual benchmarks while remaining stable to prompt variations.

citing papers explorer

Showing 2 of 2 citing papers.

Multi-Dimensional Evaluation of LLMs for Grammatical Error Correction cs.CL · 2026-05-08 · unverdicted · none · ref 12
Fine-tuned GPT-4o reaches state-of-the-art on grammatical error correction while reference-based metrics underestimate performance by missing 73.76 percent of valid or superior outputs.
Edit-level Majority Voting Mitigates Over-Correction in LLM-based Grammatical Error Correction cs.CL · 2026-05-13 · unverdicted · none · ref 32 · internal anchor
Edit-level majority voting on multiple LLM-generated candidates reduces over-correction in grammatical error correction and outperforms greedy and MBR decoding on nine multilingual benchmarks while remaining stable to prompt variations.

arXiv preprint arXiv:1605.02592 (2016), https://arxiv.org/abs/1605.02592

fields

years

verdicts

representative citing papers

citing papers explorer