Ground truth in the form of multiple choices, also provided within the prompts

CausalProbe [Chi et al · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

Effort and ability appraisals match or beat confidence in predicting LLM failures, with effort giving less overoptimistic and more stable signals across model sizes and task types.

citing papers explorer

Showing 1 of 1 citing paper.

Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs cs.CL · 2026-05-08 · unverdicted · none · ref 29
Effort and ability appraisals match or beat confidence in predicting LLM failures, with effort giving less overoptimistic and more stable signals across model sizes and task types.

Ground truth in the form of multiple choices, also provided within the prompts

fields

years

verdicts

representative citing papers

citing papers explorer