Ground truth is available in the form of Yes/No options, allowing direct string match evaluation to produce accuracy scores

ETHICS [Hendrycks et al · 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

Effort and ability appraisals match or beat confidence in predicting LLM failures, with effort giving less overoptimistic and more stable signals across model sizes and task types.

citing papers explorer

Showing 1 of 1 citing paper.

Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs cs.CL · 2026-05-08 · unverdicted · none · ref 30
Effort and ability appraisals match or beat confidence in predicting LLM failures, with effort giving less overoptimistic and more stable signals across model sizes and task types.

Ground truth is available in the form of Yes/No options, allowing direct string match evaluation to produce accuracy scores

fields

years

verdicts

representative citing papers

citing papers explorer