Validity indices adapted from clinical assessment classify four frontier LLMs as construct-level invalid on metacognitive probes, with valid models showing positive item-sensitive confidence (r=.18) while invalid ones show the opposite (r=-.20).
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
SDRL trains LLMs via self-generated multi-path debates and joint optimization of standalone plus debate-conditioned responses to boost both single-model reasoning and multi-agent debate performance.
Language models deploy multidimensional internal confidence representations and threshold-based policies to control abstention behavior, with causal support from activation steering experiments.
citing papers explorer
-
Before You Interpret the Profile: Validity Scaling for LLM Metacognitive Self-Report
Validity indices adapted from clinical assessment classify four frontier LLMs as construct-level invalid on metacognitive probes, with valid models showing positive item-sensitive confidence (r=.18) while invalid ones show the opposite (r=-.20).
-
Learning from Self-Debate: Preparing Reasoning Models for Multi-Agent Debate
SDRL trains LLMs via self-generated multi-path debates and joint optimization of standalone plus debate-conditioned responses to boost both single-model reasoning and multi-agent debate performance.
-
Causal Evidence that Language Models use Confidence to Drive Behavior
Language models deploy multidimensional internal confidence representations and threshold-based policies to control abstention behavior, with causal support from activation steering experiments.