Title resolution pending

Jon-Paul Cacioli · 2026 · arXiv 2603.25112

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Do Coding Agents Understand Least-Privilege Authorization?

cs.CR · 2026-05-14 · unverdicted · novelty 7.0

Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.

Before You Interpret the Profile: Validity Scaling for LLM Metacognitive Self-Report

cs.CL · 2026-04-20 · conditional · novelty 7.0

Validity indices adapted from clinical assessment classify four frontier LLMs as construct-level invalid on metacognitive probes, with valid models showing positive item-sensitive confidence (r=.18) while invalid ones show the opposite (r=-.20).

Hypothesis generation and updating in large language models

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.

Verbal Confidence Saturation in 3-9B Open-Weight Instruction-Tuned LLMs: A Pre-Registered Psychometric Validity Screen

cs.CL · 2026-04-24 · conditional · novelty 6.0

Seven 3-9B instruction-tuned LLMs produce verbal confidence that saturates at high values and fails psychometric validity criteria for Type-2 discrimination under minimal elicitation.

MEDLEY-BENCH: Scale Buys Evaluation but Not Control in AI Metacognition

cs.AI · 2026-04-17 · unverdicted · novelty 6.0

MEDLEY-BENCH reveals an evaluation/control dissociation in AI metacognition where scale improves reflective scoring but not proportional belief revision, with a consistent knowing/doing gap across 35 models.

K-Way Energy Probes for Metacognition Reduce to Softmax in Discriminative Predictive Coding Networks

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

K-way energy probes in discriminative PCNs reduce to a monotone function of the log-softmax margin plus an untrained residual and empirically track below softmax on CIFAR-10.

Quantisation Reshapes the Metacognitive Geometry of Language Models

cs.CL · 2026-04-10 · unverdicted · novelty 5.0

Quantization restructures domain-level M-ratio metacognitive profiles in LLMs while leaving Type-2 AUROC profiles unchanged.

citing papers explorer

Showing 7 of 7 citing papers.

Do Coding Agents Understand Least-Privilege Authorization? cs.CR · 2026-05-14 · unverdicted · none · ref 71
Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.
Before You Interpret the Profile: Validity Scaling for LLM Metacognitive Self-Report cs.CL · 2026-04-20 · conditional · none · ref 3
Validity indices adapted from clinical assessment classify four frontier LLMs as construct-level invalid on metacognitive probes, with valid models showing positive item-sensitive confidence (r=.18) while invalid ones show the opposite (r=-.20).
Hypothesis generation and updating in large language models cs.LG · 2026-05-07 · unverdicted · none · ref 27
LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.
Verbal Confidence Saturation in 3-9B Open-Weight Instruction-Tuned LLMs: A Pre-Registered Psychometric Validity Screen cs.CL · 2026-04-24 · conditional · none · ref 3
Seven 3-9B instruction-tuned LLMs produce verbal confidence that saturates at high values and fails psychometric validity criteria for Type-2 discrimination under minimal elicitation.
MEDLEY-BENCH: Scale Buys Evaluation but Not Control in AI Metacognition cs.AI · 2026-04-17 · unverdicted · none · ref 1
MEDLEY-BENCH reveals an evaluation/control dissociation in AI metacognition where scale improves reflective scoring but not proportional belief revision, with a consistent knowing/doing gap across 35 models.
K-Way Energy Probes for Metacognition Reduce to Softmax in Discriminative Predictive Coding Networks cs.LG · 2026-04-13 · unverdicted · none · ref 1
K-way energy probes in discriminative PCNs reduce to a monotone function of the log-softmax margin plus an untrained residual and empirically track below softmax on CIFAR-10.
Quantisation Reshapes the Metacognitive Geometry of Language Models cs.CL · 2026-04-10 · unverdicted · none · ref 1
Quantization restructures domain-level M-ratio metacognitive profiles in LLMs while leaving Type-2 AUROC profiles unchanged.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer