Evaluation uses the BLEU score, implemented using the sacrebleu package in Python

Text Simplification: 50 items, ground truth available in the form of direct open text

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

Effort and ability appraisals match or beat confidence in predicting LLM failures, with effort giving less overoptimistic and more stable signals across model sizes and task types.

citing papers explorer

Showing 1 of 1 citing paper.

Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs cs.CL · 2026-05-08 · unverdicted · none · ref 20
Effort and ability appraisals match or beat confidence in predicting LLM failures, with effort giving less overoptimistic and more stable signals across model sizes and task types.

Evaluation uses the BLEU score, implemented using the sacrebleu package in Python

fields

years

verdicts

representative citing papers

citing papers explorer