MEDLEY-BENCH reveals an evaluation/control dissociation in AI metacognition where scale improves reflective scoring but not proportional belief revision, with a consistent knowing/doing gap across 35 models.
Nelson and Louis Narens
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MEDLEY-BENCH: Scale Buys Evaluation but Not Control in AI Metacognition
MEDLEY-BENCH reveals an evaluation/control dissociation in AI metacognition where scale improves reflective scoring but not proportional belief revision, with a consistent knowing/doing gap across 35 models.