Effort and ability appraisals match or beat confidence in predicting LLM failures, with effort giving less overoptimistic and more stable signals across model sizes and task types.
Ground truth in the form of multiple choices, also provided within the prompts
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs
Effort and ability appraisals match or beat confidence in predicting LLM failures, with effort giving less overoptimistic and more stable signals across model sizes and task types.