Refining Word-Based Grammatical Error Annotation for L2 Korean

Benjamin Nguyen; Jayoung Song; Jungyeul Park; KyungTae Lim; Mengyang Qiu; Wonjun Oh; Zihao Huang

arxiv: 2605.30545 · v1 · pith:4PLQQVYBnew · submitted 2026-05-28 · 💻 cs.CL

Refining Word-Based Grammatical Error Annotation for L2 Korean

Jungyeul Park , Kyungtae Lim , Wonjun Oh , Benjamin Nguyen , Zihao Huang , Mengyang Qiu , Jayoung Song This is my paper

Pith reviewed 2026-06-29 07:22 UTC · model grok-4.3

classification 💻 cs.CL

keywords Korean grammatical error correctionword-based annotationm2 formatmulti-reference evaluationNIKL corpusKoLLA corpusERRANT-style schememorpheme errors

0 comments

The pith

Refined word-level m2 annotations and multi-reference targets improve Korean grammatical error correction by matching morphology and reducing single-reference penalties.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Korean GEC faces a mismatch because many learner errors sit at the morpheme level while standard evaluation stays at the word level. The paper reconstructs target sentences from the NIKL corpus using morphologically constrained realization rules, converts the morpheme annotations into word-level m2 edits, and defines a Korean ERRANT-style scheme that keeps the MRU core while tagging functional morpheme, spelling, boundary, and order errors. It also adds a second reference to the KoLLA corpus. These steps produce lower-perplexity targets, higher agreement between converted m2 files and source-target edits, stronger KoBART correction results, and lower penalties for valid but divergent corrections under multi-reference scoring. Readers care because evaluation resources directly shape which correction models are judged successful.

Core claim

Reconstructing NIKL targets under morphologically constrained realization rules, converting morpheme annotations to word-level m2 edits via a Korean ERRANT-style scheme that distinguishes functional morpheme errors, spelling errors, word boundary errors, and word order errors, and augmenting KoLLA with a second reference yields lower perplexity, higher edit agreement, improved KoBART performance under fixed model settings, and reduced penalties for valid corrections that differ from a single reference.

What carries the argument

The Korean ERRANT-style annotation scheme that preserves the MRU core while distinguishing functional morpheme errors, spelling errors, word boundary errors, and word order errors, together with the morphologically constrained reconstruction of NIKL targets.

If this is right

Refined NIKL targets produce lower perplexity than the original corpus targets.
The converted m2 files achieve higher agreement with direct source-target edit representations.
The refined resources raise KoBART-based correction performance under identical model settings.
Multi-reference evaluation on the augmented KoLLA corpus lowers the penalty assigned to valid corrections that diverge from a single reference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same reconstruction-plus-conversion pipeline could reduce evaluation noise in other languages where grammatical morphemes attach to lexical hosts.
Multi-reference scoring may become standard for GEC tasks that exhibit high correction variability across native speakers.
Explicit tagging of word-boundary and spacing errors may help models handle spacing-sensitive phenomena that current word-tokenizers treat as secondary.

Load-bearing premise

The morphologically constrained realization rules used to reconstruct target sentences from the NIKL corpus accurately represent the intended corrections without introducing new errors or biases.

What would settle it

A side-by-side check that shows the reconstructed NIKL targets contain more unintended changes than the original annotations, or that KoBART performance stays the same or drops when trained on the converted m2 files.

Figures

Figures reproduced from arXiv: 2605.30545 by Benjamin Nguyen, Jayoung Song, Jungyeul Park, KyungTae Lim, Mengyang Qiu, Wonjun Oh, Zihao Huang.

**Figure 2.** Figure 2: Example entry from the NIKL L2 K-GEC dataset. Each sentence is annotated at the [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of prior and refined word-based [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Schematic examples of generalized Replacement error types. 4.4 Generalized classification algorithm Algorithm 2 formalizes the annotation procedure. Given a source span S and a correction span T , the classifier returns an edit label from the MRU inventory and the Korean-specific replacement categories. Functional morpheme differences are checked first because they are grammatically central and can be obs… view at source ↗

**Figure 5.** Figure 5: Examples of divergent annotation strategies in the Korean [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗

**Figure 6.** Figure 6: Instruction prompt used for generative grammatical error correction. [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗

read the original abstract

Korean grammatical error correction (K-GEC) presents a structural mismatch between word-based evaluation and the morpheme-level locus of many learner errors. Postpositions and verbal endings are bound to lexical hosts, but they encode grammatical relations that must be represented in correction and evaluation. This paper refines word-based grammatical error annotation for L2 Korean by addressing three connected problems in existing resources: surface target realization, Korean-specific edit annotation, and single-reference evaluation. We reconstruct target sentences from the National Institute of Korean Language (NIKL) L2 corpus under morphologically constrained realization rules and convert its morpheme-level annotations into word-level \texttt{m2} edits. We then define a Korean ERRANT-style annotation scheme that preserves the MRU core while distinguishing functional morpheme errors, spelling errors, word boundary errors, and word order errors. We also augment the KoLLA corpus with an additional reference correction, yielding a multi-reference evaluation setting for Korean GEC. Empirical validation shows that the refined NIKL targets yield lower perplexity, the converted \texttt{m2} files achieve higher agreement with source-target edit representations, and the refined resources improve KoBART-based correction under the same model setting. Multi-reference KoLLA evaluation further reduces the penalty imposed on valid corrections that diverge from a single reference, especially for neural and prompted GEC systems. These results show that Korean GEC evaluation depends not only on correction models, but also on reference data and edit annotations that reflect Korean morphology, spacing, and correction variability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper refines Korean GEC resources by adapting ERRANT-style edits to handle morphemes and adding a second reference to KoLLA, with claims of better perplexity and agreement that rest on unvalidated reconstruction rules.

read the letter

The core contribution is turning two existing Korean learner corpora into more usable evaluation resources. They reconstruct word-level targets from the NIKL morpheme annotations using constrained rules, map those into m2 format, define a Korean ERRANT variant that separates functional morpheme errors from spelling and boundary issues, and add a second reference to KoLLA so multi-reference scoring can be tried.

This is useful because Korean errors frequently sit on bound morphemes that standard word-based schemes mishandle. The multi-reference move directly tackles the problem that single references penalize valid but different corrections, which matters for neural and prompted systems.

The paper does the resource work cleanly on paper: it names the three problems (target realization, edit scheme, single reference) and gives concrete fixes rather than vague suggestions. The abstract reports that the refined targets lower perplexity, the m2 files raise agreement, and the changes help a KoBART model, plus the extra reference reduces penalties.

The soft spot is the reconstruction step. The claims depend on those morphologically constrained rules faithfully recovering the intended corrections, yet the abstract supplies no coverage numbers, fidelity checks against original annotators, or examples of edge cases like spacing and verbal endings. If the rules systematically alter targets, the reported gains could be partly artifacts. Without the actual tables or rule details, it is hard to tell how large or robust the improvements are.

This is for people already working on Korean GEC or evaluation resources for morphologically complex languages. It will not move the broader field but gives better tools for this slice of the problem. The thinking is straightforward and the gaps are acknowledged rather than hidden.

I would send it to peer review. The topic is narrow but the changes address a documented mismatch, and having the refined resources available matters for anyone evaluating Korean correction models.

Referee Report

2 major / 1 minor

Summary. The manuscript refines word-based grammatical error annotation for L2 Korean to address mismatches with morpheme-level errors. It reconstructs target sentences from the NIKL L2 corpus using morphologically constrained realization rules, converts morpheme annotations to word-level m2 edits via a Korean ERRANT-style scheme that distinguishes functional morpheme, spelling, boundary, and order errors, and augments the KoLLA corpus with a second reference for multi-reference evaluation. Empirical claims include lower perplexity on refined NIKL targets, higher agreement of converted m2 files with source-target edits, improved KoBART correction performance, and reduced penalties for valid but divergent corrections under multi-reference evaluation.

Significance. If the central claims hold, the work provides useful language-specific resources and evaluation practices for Korean GEC, where morphology, spacing, and postpositions create distinct challenges not well served by direct transfer of English-centric tools. The multi-reference augmentation is a clear strength, as it directly mitigates single-reference bias. The paper earns credit for grounding refinements in Korean linguistic properties rather than generic annotation conversion.

major comments (2)

[Abstract] Abstract and reconstruction paragraph: the central empirical claims (lower perplexity, higher m2 agreement, improved KoBART results) rest on the morphologically constrained realization rules converting NIKL morpheme annotations into word-level targets. No coverage statistics, derivation of the rules, or fidelity check against original annotator intent is supplied; systematic bias in spacing, postposition attachment, or verbal endings would artifactually inflate the reported gains.
[Results] Results section: the abstract states that refined targets yield lower perplexity and higher agreement, yet supplies no numerical values, baselines, error bars, or full experimental setup details. Without these, the magnitude and reliability of the improvements cannot be assessed.

minor comments (1)

[Section 3] The description of the Korean ERRANT-style scheme would benefit from an explicit table contrasting the new categories (functional morpheme errors, word boundary errors) with standard ERRANT labels.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for greater transparency in our methodology and results. We agree that the current presentation lacks sufficient detail on the realization rules and experimental outcomes, and we will revise the manuscript to address these gaps directly.

read point-by-point responses

Referee: [Abstract] Abstract and reconstruction paragraph: the central empirical claims (lower perplexity, higher m2 agreement, improved KoBART results) rest on the morphologically constrained realization rules converting NIKL morpheme annotations into word-level targets. No coverage statistics, derivation of the rules, or fidelity check against original annotator intent is supplied; systematic bias in spacing, postposition attachment, or verbal endings would artifactually inflate the reported gains.

Authors: We agree that the abstract and reconstruction paragraph do not supply coverage statistics, a derivation of the rules, or a fidelity check. In the revised manuscript we will add an expanded methods subsection that (1) lists the full set of morphologically constrained realization rules with their linguistic motivation drawn from Korean grammar, (2) reports coverage statistics on the NIKL corpus (percentage of annotations converted without manual intervention), and (3) presents a fidelity analysis comparing reconstructed targets against the original annotator intent to demonstrate absence of systematic bias in spacing, postposition attachment, or verbal endings. revision: yes
Referee: [Results] Results section: the abstract states that refined targets yield lower perplexity and higher agreement, yet supplies no numerical values, baselines, error bars, or full experimental setup details. Without these, the magnitude and reliability of the improvements cannot be assessed.

Authors: We acknowledge that the results section currently describes improvements only qualitatively and omits numerical values, baselines, error bars, and full experimental details. In the revision we will insert the concrete metrics (perplexity scores with baselines, m2 agreement percentages, KoBART F1/accuracy figures), include error bars or confidence intervals where appropriate, and provide the complete experimental setup (hyperparameters, data splits, evaluation scripts) so that the magnitude and reliability of the reported gains can be assessed. revision: yes

Circularity Check

0 steps flagged

No significant circularity; annotation conversion and empirical validation are self-contained

full rationale

The paper performs corpus refinement by converting morpheme-level NIKL annotations to word-level m2 edits via externally defined realization rules, augments KoLLA with an additional reference, and reports empirical metrics (perplexity, agreement, model performance) on the resulting resources. No equations, fitted parameters, or predictions appear; the central claims rest on external corpora and standard evaluation protocols rather than any self-definitional loop, fitted-input renaming, or load-bearing self-citation chain. The reconstruction rules are presented as a methodological choice without internal derivation that reduces to the reported outcomes by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities. Relies on existing public corpora and standard NLP annotation practices.

pith-pipeline@v0.9.1-grok · 5820 in / 1126 out tokens · 22311 ms · 2026-06-29T07:22:12.939184+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 7 canonical work pages · 1 internal anchor

[1]

Automatic annotation of error types for grammatical error correction

Christopher Bryant. Automatic annotation of error types for grammatical error correction . PhD thesis, University of Cambridge, Churchill College, Cambridge, UK, 2019. URL https://doi.org/10.17863/CAM.40832

work page doi:10.17863/cam.40832 2019
[2]

Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction

Christopher Bryant, Mariano Felice, and Ted Briscoe. Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction . In Regina Barzilay and Min-Yen Kan, editors, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 793--805, Vancouver, Canada, 7 2017. Association for C...

work page doi:10.18653/v1/p17-1074 2017
[3]

Better Evaluation for Grammatical Error Correction

Daniel Dahlmeier and Hwee Tou Ng. Better Evaluation for Grammatical Error Correction . In Eric Fosler-Lussier, Ellen Riloff, and Srinivas Bangalore, editors, Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 568--572, Montr \' e al, Canada, 6 2012. Associat...

2012
[4]

Helping Our Own: The HOO 2011 Pilot Shared Task

Robert Dale and Adam Kilgarriff. Helping Our Own: The HOO 2011 Pilot Shared Task . In Claire Gardent and Kristina Striegnitz, editors, Proceedings of the 13th European Workshop on Natural Language Generation, pages 242--249, Nancy, France, 9 2011. Association for Computational Linguistics. URL https://aclanthology.org/W11-2838/

2011
[5]

Building a Korean Web Corpus for Analyzing Learner Language

Markus Dickinson, Ross Israel, and Sun-Hee Lee. Building a Korean Web Corpus for Analyzing Learner Language . In Proceedings of the 6th Workshop on the Web as Corpus (WAC-6), pages 8--16, Los Angeles, 2010. URL http://jones.ling.indiana.edu/ mdickinson/papers/dickinson-israel-lee10.html

2010
[6]

Towards a standard evaluation method for grammatical error detection and correction

Mariano Felice and Ted Briscoe. Towards a standard evaluation method for grammatical error detection and correction . In Rada Mihalcea, Joyce Chai, and Anoop Sarkar, editors, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 578--587, Denver, Colorado, 2015...

work page doi:10.3115/v1/n15-1060 2015
[7]

Improving Automatic Grammatical Error Annotation for Chinese Through Linguistically-Informed Error Typology

Yang Gu, Zihao Huang, Min Zeng, Mengyang Qiu, and Jungyeul Park. Improving Automatic Grammatical Error Annotation for Chinese Through Linguistically-Informed Error Typology . In Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, and Steven Schockaert, editors, Proceedings of the 31st International Conference on Computationa...

2025
[8]

Developing Learner Corpus Annotation for Korean Particle Errors

Sun-Hee Lee, Markus Dickinson, and Ross Israel. Developing Learner Corpus Annotation for Korean Particle Errors . In Proceedings of the Sixth Linguistic Annotation Workshop, pages 129--133, Jeju, Republic of Korea, 2012. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/W12-3617

2012
[9]

The MultiGEC-2025 Shared Task on Multilingual Grammatical Error Correction at NLP4CALL

Arianna Masciolini, Andrew Caines, Orphée De Clercq, Joni Kruijsbergen, Murathan Kurfal i, Ricardo Mu \ n oz S \' a nchez, Elena Volodina, and Robert \" O stling. The MultiGEC-2025 Shared Task on Multilingual Grammatical Error Correction at NLP4CALL . In Ricardo Mu \ n oz S \' a nchez, David Alfter, Elena Volodina, and Jelena Kallas, editors, Proceedings ...

2025
[10]

Ground Truth for Grammatical Error Correction Metrics

Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. Ground Truth for Grammatical Error Correction Metrics . In Chengqing Zong and Michael Strube, editors, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)...

work page doi:10.3115/v1/p15-2097 2015
[11]

GLEU Without Tuning

Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. GLEU Without Tuning , 2016. URL https://arxiv.org/abs/1605.02592

work page internal anchor Pith review Pith/arXiv arXiv 2016
[12]

KLUE: Korean Language Understanding Evaluation

Sungjoon Park, Jihyung Moon, Sungdong Kim, Won Ik Cho, Ji Yoon Han, Jangwon Park, Chisung Song, Junseong Kim, Youngsook Song, Taehwan Oh, Joohong Lee, Juhyun Oh, Sungwon Lyu, Younghoon Jeong, Inkwon Lee, Sangwoo Seo, Dongjun Lee, Hyunwoo Kim, Myeonghwa Lee, Seongbo Jang, Seungwon Do, Sunkyoung Kim, Kyungtae Lim, Jongwon Lee, Kyumin Park, Jamin Shin, Seong...

2021
[13]

Multilingual Grammatical Error Annotation: Combining Language-Agnostic Framework with Language-Specific Flexibility

Mengyang Qiu, Tran Minh Nguyen, Zihao Huang, Zelong Li, Yang Gu, Qingyu Gao, Siliang Liu, and Jungyeul Park. Multilingual Grammatical Error Annotation: Combining Language-Agnostic Framework with Language-Specific Flexibility . In Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yane...

2025
[14]

Enriching the Korean learner corpus for grammatical error correction and writing assessment

Jayoung Song, KyungTae Lim, and Jungyeul Park. Enriching the Korean learner corpus for grammatical error correction and writing assessment . Language Resources and Evaluation, 60 0 (1): 0 1--19, 2026. ISSN 1574-0218. doi:10.1007/s10579-025-09882-9. URL https://doi.org/10.1007/s10579-025-09882-9

work page doi:10.1007/s10579-025-09882-9 2026
[15]

Joint Evaluation of Morphological Segmentation and Syntactic Parsing

Reut Tsarfaty, Joakim Nivre, and Evelina Andersson. Joint Evaluation of Morphological Segmentation and Syntactic Parsing . In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 6--10, Jeju Island, Korea, 7 2012. Association for Computational Linguistics. URL http://www.aclweb.org/antholo...

2012
[16]

Towards standardizing Korean Grammatical Error Correction: Datasets and Annotation

Soyoung Yoon, Sungjoon Park, Gyuwan Kim, Junhee Cho, Kihyo Park, Gyu Tae Kim, Minjoon Seo, and Alice Oh. Towards standardizing Korean Grammatical Error Correction: Datasets and Annotation . In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6713--6742, Toronto, Canada, 7 2023. Associat...

2023
[17]

MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction

Yue Zhang, Zhenghua Li, Zuyi Bao, Jiacheng Li, Bo Zhang, Chen Li, Fei Huang, and Min Zhang. MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction . In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3118--3130,...

work page doi:10.18653/v1/2022.naacl-main.227 2022

[1] [1]

Automatic annotation of error types for grammatical error correction

Christopher Bryant. Automatic annotation of error types for grammatical error correction . PhD thesis, University of Cambridge, Churchill College, Cambridge, UK, 2019. URL https://doi.org/10.17863/CAM.40832

work page doi:10.17863/cam.40832 2019

[2] [2]

Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction

Christopher Bryant, Mariano Felice, and Ted Briscoe. Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction . In Regina Barzilay and Min-Yen Kan, editors, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 793--805, Vancouver, Canada, 7 2017. Association for C...

work page doi:10.18653/v1/p17-1074 2017

[3] [3]

Better Evaluation for Grammatical Error Correction

Daniel Dahlmeier and Hwee Tou Ng. Better Evaluation for Grammatical Error Correction . In Eric Fosler-Lussier, Ellen Riloff, and Srinivas Bangalore, editors, Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 568--572, Montr \' e al, Canada, 6 2012. Associat...

2012

[4] [4]

Helping Our Own: The HOO 2011 Pilot Shared Task

Robert Dale and Adam Kilgarriff. Helping Our Own: The HOO 2011 Pilot Shared Task . In Claire Gardent and Kristina Striegnitz, editors, Proceedings of the 13th European Workshop on Natural Language Generation, pages 242--249, Nancy, France, 9 2011. Association for Computational Linguistics. URL https://aclanthology.org/W11-2838/

2011

[5] [5]

Building a Korean Web Corpus for Analyzing Learner Language

Markus Dickinson, Ross Israel, and Sun-Hee Lee. Building a Korean Web Corpus for Analyzing Learner Language . In Proceedings of the 6th Workshop on the Web as Corpus (WAC-6), pages 8--16, Los Angeles, 2010. URL http://jones.ling.indiana.edu/ mdickinson/papers/dickinson-israel-lee10.html

2010

[6] [6]

Towards a standard evaluation method for grammatical error detection and correction

Mariano Felice and Ted Briscoe. Towards a standard evaluation method for grammatical error detection and correction . In Rada Mihalcea, Joyce Chai, and Anoop Sarkar, editors, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 578--587, Denver, Colorado, 2015...

work page doi:10.3115/v1/n15-1060 2015

[7] [7]

Improving Automatic Grammatical Error Annotation for Chinese Through Linguistically-Informed Error Typology

Yang Gu, Zihao Huang, Min Zeng, Mengyang Qiu, and Jungyeul Park. Improving Automatic Grammatical Error Annotation for Chinese Through Linguistically-Informed Error Typology . In Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, and Steven Schockaert, editors, Proceedings of the 31st International Conference on Computationa...

2025

[8] [8]

Developing Learner Corpus Annotation for Korean Particle Errors

Sun-Hee Lee, Markus Dickinson, and Ross Israel. Developing Learner Corpus Annotation for Korean Particle Errors . In Proceedings of the Sixth Linguistic Annotation Workshop, pages 129--133, Jeju, Republic of Korea, 2012. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/W12-3617

2012

[9] [9]

The MultiGEC-2025 Shared Task on Multilingual Grammatical Error Correction at NLP4CALL

Arianna Masciolini, Andrew Caines, Orphée De Clercq, Joni Kruijsbergen, Murathan Kurfal i, Ricardo Mu \ n oz S \' a nchez, Elena Volodina, and Robert \" O stling. The MultiGEC-2025 Shared Task on Multilingual Grammatical Error Correction at NLP4CALL . In Ricardo Mu \ n oz S \' a nchez, David Alfter, Elena Volodina, and Jelena Kallas, editors, Proceedings ...

2025

[10] [10]

Ground Truth for Grammatical Error Correction Metrics

Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. Ground Truth for Grammatical Error Correction Metrics . In Chengqing Zong and Michael Strube, editors, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)...

work page doi:10.3115/v1/p15-2097 2015

[11] [11]

GLEU Without Tuning

Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. GLEU Without Tuning , 2016. URL https://arxiv.org/abs/1605.02592

work page internal anchor Pith review Pith/arXiv arXiv 2016

[12] [12]

KLUE: Korean Language Understanding Evaluation

Sungjoon Park, Jihyung Moon, Sungdong Kim, Won Ik Cho, Ji Yoon Han, Jangwon Park, Chisung Song, Junseong Kim, Youngsook Song, Taehwan Oh, Joohong Lee, Juhyun Oh, Sungwon Lyu, Younghoon Jeong, Inkwon Lee, Sangwoo Seo, Dongjun Lee, Hyunwoo Kim, Myeonghwa Lee, Seongbo Jang, Seungwon Do, Sunkyoung Kim, Kyungtae Lim, Jongwon Lee, Kyumin Park, Jamin Shin, Seong...

2021

[13] [13]

Multilingual Grammatical Error Annotation: Combining Language-Agnostic Framework with Language-Specific Flexibility

Mengyang Qiu, Tran Minh Nguyen, Zihao Huang, Zelong Li, Yang Gu, Qingyu Gao, Siliang Liu, and Jungyeul Park. Multilingual Grammatical Error Annotation: Combining Language-Agnostic Framework with Language-Specific Flexibility . In Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yane...

2025

[14] [14]

Enriching the Korean learner corpus for grammatical error correction and writing assessment

Jayoung Song, KyungTae Lim, and Jungyeul Park. Enriching the Korean learner corpus for grammatical error correction and writing assessment . Language Resources and Evaluation, 60 0 (1): 0 1--19, 2026. ISSN 1574-0218. doi:10.1007/s10579-025-09882-9. URL https://doi.org/10.1007/s10579-025-09882-9

work page doi:10.1007/s10579-025-09882-9 2026

[15] [15]

Joint Evaluation of Morphological Segmentation and Syntactic Parsing

Reut Tsarfaty, Joakim Nivre, and Evelina Andersson. Joint Evaluation of Morphological Segmentation and Syntactic Parsing . In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 6--10, Jeju Island, Korea, 7 2012. Association for Computational Linguistics. URL http://www.aclweb.org/antholo...

2012

[16] [16]

Towards standardizing Korean Grammatical Error Correction: Datasets and Annotation

Soyoung Yoon, Sungjoon Park, Gyuwan Kim, Junhee Cho, Kihyo Park, Gyu Tae Kim, Minjoon Seo, and Alice Oh. Towards standardizing Korean Grammatical Error Correction: Datasets and Annotation . In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6713--6742, Toronto, Canada, 7 2023. Associat...

2023

[17] [17]

MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction

Yue Zhang, Zhenghua Li, Zuyi Bao, Jiacheng Li, Bo Zhang, Chen Li, Fei Huang, and Min Zhang. MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction . In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3118--3130,...

work page doi:10.18653/v1/2022.naacl-main.227 2022