After answering, place a bet on your answer: 1–10 points. If correct, you gain the points. If wrong, you lose them

URLhttps://arxiv · 2007 · arXiv 2602.07905

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

representative citing papers

MIRROR: A Hierarchical Benchmark for Metacognitive Calibration in Large Language Models

cs.AI · 2026-04-15 · unverdicted · novelty 7.0

MIRROR benchmark shows LLMs universally fail at compositional self-prediction and cannot translate partial self-knowledge into better agentic actions, with external metacognitive control reducing confident failures by ~70-76%.

citing papers explorer

Showing 1 of 1 citing paper.

MIRROR: A Hierarchical Benchmark for Metacognitive Calibration in Large Language Models cs.AI · 2026-04-15 · unverdicted · none · ref 2 · internal anchor
MIRROR benchmark shows LLMs universally fail at compositional self-prediction and cannot translate partial self-knowledge into better agentic actions, with external metacognitive control reducing confident failures by ~70-76%.

After answering, place a bet on your answer: 1–10 points. If correct, you gain the points. If wrong, you lose them

fields

years

verdicts

representative citing papers

citing papers explorer