Is chatgpt a highly fluent grammatical error correction system? a comprehensive evaluation

Fang, Tao, Shu Yang, Kaixin Lan, Derek F · 2023 · arXiv 2304.01746

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 support 1

representative citing papers

Prompt Framing Distorts Count-Based Evaluation of LLM Error Detection: Evidence from Numeric Anchoring

cs.CL · 2026-05-03 · unverdicted · novelty 7.0

Anchored prompts inflate count-based F1 by up to 0.79 in LLM error detection while raising span-aware ERRANT F0.5 by only 0.04 on average.

MC-PDD: Masked Corpus-Level Pretraining Data Detection for Black-Box Large Language Models

cs.CL · 2026-06-06 · unverdicted · novelty 6.0

A masked-token hit-rate comparison method detects pretraining data membership in black-box LLMs with performance comparable to white-box approaches.

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

cs.AI · 2023-08-10 · accept · novelty 5.0

Survey organizes LLM trustworthiness into seven categories and 29 sub-categories, measures eight sub-categories on popular models, and finds that more aligned models generally score higher but with varying effectiveness.

The Rise and Potential of Large Language Model Based Agents: A Survey

cs.AI · 2023-09-14 · accept · novelty 4.0

The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Prompt Framing Distorts Count-Based Evaluation of LLM Error Detection: Evidence from Numeric Anchoring cs.CL · 2026-05-03 · unverdicted · none · ref 8
Anchored prompts inflate count-based F1 by up to 0.79 in LLM error detection while raising span-aware ERRANT F0.5 by only 0.04 on average.
MC-PDD: Masked Corpus-Level Pretraining Data Detection for Black-Box Large Language Models cs.CL · 2026-06-06 · unverdicted · none · ref 6
A masked-token hit-rate comparison method detects pretraining data membership in black-box LLMs with performance comparable to white-box approaches.

Is chatgpt a highly fluent grammatical error correction system? a comprehensive evaluation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer