Anchored prompts inflate count-based F1 by up to 0.79 in LLM error detection while raising span-aware ERRANT F0.5 by only 0.04 on average.
and Zhang, Xiangliang , title =
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Prompt Framing Distorts Count-Based Evaluation of LLM Error Detection: Evidence from Numeric Anchoring
Anchored prompts inflate count-based F1 by up to 0.79 in LLM error detection while raising span-aware ERRANT F0.5 by only 0.04 on average.