Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.
Title resolution pending
7 Pith papers cite this work. Polarity classification is still indexing.
years
2026 7representative citing papers
Validity indices adapted from clinical assessment classify four frontier LLMs as construct-level invalid on metacognitive probes, with valid models showing positive item-sensitive confidence (r=.18) while invalid ones show the opposite (r=-.20).
LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.
Seven 3-9B instruction-tuned LLMs produce verbal confidence that saturates at high values and fails psychometric validity criteria for Type-2 discrimination under minimal elicitation.
MEDLEY-BENCH reveals an evaluation/control dissociation in AI metacognition where scale improves reflective scoring but not proportional belief revision, with a consistent knowing/doing gap across 35 models.
K-way energy probes in discriminative PCNs reduce to a monotone function of the log-softmax margin plus an untrained residual and empirically track below softmax on CIFAR-10.
Quantization restructures domain-level M-ratio metacognitive profiles in LLMs while leaving Type-2 AUROC profiles unchanged.
citing papers explorer
-
Do Coding Agents Understand Least-Privilege Authorization?
Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.
-
Before You Interpret the Profile: Validity Scaling for LLM Metacognitive Self-Report
Validity indices adapted from clinical assessment classify four frontier LLMs as construct-level invalid on metacognitive probes, with valid models showing positive item-sensitive confidence (r=.18) while invalid ones show the opposite (r=-.20).
-
Hypothesis generation and updating in large language models
LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.
-
Verbal Confidence Saturation in 3-9B Open-Weight Instruction-Tuned LLMs: A Pre-Registered Psychometric Validity Screen
Seven 3-9B instruction-tuned LLMs produce verbal confidence that saturates at high values and fails psychometric validity criteria for Type-2 discrimination under minimal elicitation.
-
MEDLEY-BENCH: Scale Buys Evaluation but Not Control in AI Metacognition
MEDLEY-BENCH reveals an evaluation/control dissociation in AI metacognition where scale improves reflective scoring but not proportional belief revision, with a consistent knowing/doing gap across 35 models.
-
K-Way Energy Probes for Metacognition Reduce to Softmax in Discriminative Predictive Coding Networks
K-way energy probes in discriminative PCNs reduce to a monotone function of the log-softmax margin plus an untrained residual and empirically track below softmax on CIFAR-10.
-
Quantisation Reshapes the Metacognitive Geometry of Language Models
Quantization restructures domain-level M-ratio metacognitive profiles in LLMs while leaving Type-2 AUROC profiles unchanged.