arxiv: 2604.08976 · v1 · submitted 2026-04-10 · 💻 cs.CL

Recognition: unknown

Quantisation Reshapes the Metacognitive Geometry of Language Models

Jon-Paul Cacioli

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:25 UTC · model grok-4.3

classification 💻 cs.CL

keywords model quantizationmetacognitionlarge language modelsM-ratioType-2 AUROCdomain knowledgeconfidence calibrationquantized inference

0 comments

The pith

Model quantisation restructures domain-level metacognitive efficiency in LLMs rather than degrading it uniformly.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that reducing numerical precision through quantisation does not simply weaken a language model's ability to know what it knows across all areas. Instead, the same set of questions tested at full precision and at a reduced level produced completely uncorrelated rankings of domains by metacognitive efficiency. Domains that were poorly monitored at one precision became the best monitored at the other, while the model's basic ability to tell correct answers from incorrect ones stayed the same. This discovery came from a failed pre-registered effort to fix weak domains with targeted training, which succeeded in changing confidence patterns but could not improve overall metacognition because the weak domains shifted with the format. The work therefore identifies an unexamined dependence: any use of domain-specific metacognitive profiles must specify the exact inference precision.

Core claim

Model quantisation restructures domain-level metacognitive efficiency in LLMs rather than degrading it uniformly. When Llama-3-8B-Instruct answered the same 3,000 questions at Q5_K_M and f16 precision, M-ratio profiles across four knowledge domains showed zero correlation (Spearman rho = 0.00), with Arts & Literature moving from worst-monitored (0.606) to best-monitored (1.542) and Geography moving in the opposite direction. Type-2 AUROC profiles remained identical across formats (rho = 1.00), localising the change to the normalisation step inside the M-ratio. A pre-registered domain-conditional training intervention produced null results on meta-d' because the diagnostic profile did not re-

What carries the argument

The M-ratio, a normalised measure of metacognitive efficiency that divides a calibration signal by the model's discrimination ability (Type-2 AUROC), which isolates the domain-specific effects of quantisation on self-monitoring.

Load-bearing premise

M-ratio is a format-independent measure of metacognitive efficiency whose domain profile should be stable or transferable rather than an artifact of numerical precision.

What would settle it

Re-running the identical 3,000-question evaluation on the same model and domains at both precisions and obtaining a high Spearman correlation between the two M-ratio domain profiles would falsify the restructuring claim.

Figures

Figures reproduced from arXiv: 2604.08976 by Jon-Paul Cacioli.

**Figure 1.** Figure 1: M-ratio profiles by domain at Q5_K_M and f16 on the same 3,000 questions. Arts moves from rank 4 to rank 1. Geography moves from rank 2 to rank 3. Spearman ρ = 0.00. rank 2) to under-monitored (0.798, rank 3). The Spearman rank correlation between Q5_K_M and f16 M-ratio profiles is ρ = 0.00. The d ′ values shift non-proportionally across domains and formats. Arts d ′ drops from 0.891 (Q5_K_M) to 0.559 (f16… view at source ↗

**Figure 2.** Figure 2: AUROC2 profiles by domain at Q5_K_M and f16 on the same 3,000 questions. Rank ordering is perfectly preserved. Spearman ρ = 1.00. This dissociation is the central empirical result. M-ratio restructures (ρ = 0.00) while AUROC2 is stable (ρ = 1.00). The raw Type-2 discrimination signal is format-invariant. What changes is the M-ratio, which divides meta-d ′ by d ′ . Because quantisation shifts d ′ non-propor… view at source ↗

read the original abstract

We report that model quantisation restructures domain-level metacognitive efficiency in LLMs rather than degrading it uniformly. Evaluating Llama-3-8B-Instruct on the same 3,000 questions at Q5_K_M and f16 precision, we find that M-ratio profiles across four knowledge domains are uncorrelated between formats (Spearman rho = 0.00). Arts & Literature moves from worst-monitored (M-ratio = 0.606 at Q5_K_M) to best-monitored (1.542 at f16). Geography moves from well-monitored (1.210) to under-monitored (0.798). However, Type-2 AUROC profiles are perfectly stable across formats (rho = 1.00), localising the restructuring to the M-ratio normalisation rather than the underlying discrimination signal. This finding emerged from a pre-registered attempt to improve metacognition through domain-conditional training. We prescribed confidence-amplification SFT for the diagnosed weak domain, with matched-budget agnostic and wrong-prescription controls. All four confirmatory hypotheses were null (10,000 bootstrap resamples, seed = 42). The training successfully reshaped confidence distributions, doubling the NLP gap in Science from 0.076 to 0.152, but did not improve meta-d' because the diagnostic profile did not transfer across formats. Any system relying on domain-level M-ratio profiles has an unexamined dependency on inference format. Systems using AUROC_2 are safer. We release all code, pre-registrations, and trial-level data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Quantization flips M-ratio domain profiles in Llama-3 but four domains make the restructuring claim hard to trust without more stats.

read the letter

Quantization does not degrade metacognition uniformly in this Llama-3 model. Instead it flips the domain profile for M-ratio while leaving AUROC unchanged, and the attempted domain-specific training does not transfer across precision formats. The paper reports this from a pre-registered experiment on 3000 questions across four domains. They show the M-ratio Spearman correlation drops to zero between Q5 and f16, with Arts and Literature going from low to high monitoring and Geography the reverse. AUROC profiles match perfectly. The training interventions all came up null under bootstrap tests, even though confidence distributions shifted. Code and data are released. This honesty about the failed transfer is useful. The main weakness is the reliance on four domains for the profile comparison. A zero correlation with such a small set does not strongly demonstrate restructuring, and the paper does not provide uncertainty measures on the M-ratio changes the way it does for the training results. This leaves open whether the observed flips are reliable or could arise from sampling variability in the signal detection fits. The assumption that M-ratio domain profiles should remain stable across formats may itself be the issue, as the stress test suggests. The work is aimed at people studying metacognition and calibration in large language models, particularly those concerned with quantization effects in deployment. It is worth a serious referee because the design is transparent, the null results are fully reported, and the data release allows others to check the numbers. I would send this to peer review.

Referee Report

1 major / 3 minor

Summary. The paper claims that quantisation of Llama-3-8B-Instruct from f16 to Q5_K_M restructures domain-level metacognitive efficiency rather than degrading it uniformly. M-ratio profiles across four knowledge domains are uncorrelated between formats (Spearman rho = 0.00), with crossed orderings (e.g., Arts & Literature from 0.606 to 1.542; Geography from 1.210 to 0.798), while Type-2 AUROC profiles remain perfectly stable (rho = 1.00). This observation emerged from a pre-registered domain-conditional SFT experiment on confidence amplification that returned null results on all four confirmatory hypotheses (10,000 bootstrap resamples) despite successfully reshaping confidence distributions.

Significance. If the restructuring claim holds, it indicates that M-ratio-based metacognitive assessments carry an unexamined dependency on inference precision, while Type-2 AUROC appears more robust; this has practical implications for LLM deployment in domains requiring reliable self-monitoring. Strengths include the pre-registered design, explicit null results on training hypotheses, 10,000-bootstrap procedure, and public release of code plus trial-level data, all of which support reproducibility of the empirical components.

major comments (1)

[Results (M-ratio and Type-2 AUROC profiles)] Results section (M-ratio profiles across formats): The claim that quantisation restructures domain metacognitive efficiency rests on the M-ratio profiles being uncorrelated (Spearman rho = 0.00) with crossed orderings across only four domains. No bootstrap, permutation test, or per-domain confidence intervals are reported for the M-ratio values or their differences, in contrast to the 10,000-bootstrap procedure used for the training hypotheses. With n=4, rho = 0.00 is compatible with many permutations and does not by itself establish a reliable restructuring once finite-sample variability in the 3,000-question set and signal-detection fits is accounted for.

minor comments (3)

[Abstract] Abstract and Results: The four knowledge domains are referenced but not named in the abstract; explicitly listing them (e.g., Arts & Literature, Geography, etc.) would improve immediate clarity.
[Methods] Methods: Clarify whether the quantisation comparison and M-ratio profile analysis were part of the pre-registration or emerged post-hoc, to distinguish confirmatory from exploratory elements.
[Figures and Tables] Figures/Tables: Ensure any tables or figures reporting M-ratios include the exact normalisation formula and note that the observed profile changes are localised to this step rather than the underlying Type-2 discrimination.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We agree that the statistical robustness of the reported M-ratio profile differences merits additional support and will strengthen the manuscript accordingly.

read point-by-point responses

Referee: The claim that quantisation restructures domain metacognitive efficiency rests on the M-ratio profiles being uncorrelated (Spearman rho = 0.00) with crossed orderings across only four domains. No bootstrap, permutation test, or per-domain confidence intervals are reported for the M-ratio values or their differences, in contrast to the 10,000-bootstrap procedure used for the training hypotheses. With n=4, rho = 0.00 is compatible with many permutations and does not by itself establish a reliable restructuring once finite-sample variability in the 3,000-question set and signal-detection fits is accounted for.

Authors: We acknowledge the validity of this concern. The crossed orderings (Arts & Literature rising from 0.606 to 1.542; Geography falling from 1.210 to 0.798) and the zero correlation are striking, yet with only four domains the Spearman coefficient alone cannot rule out sampling variability from the 3,000-question set or the signal-detection parameter estimates. In the revised version we will add a bootstrap procedure that resamples the full trial-level data with replacement, refits the per-domain, per-format metacognitive models, recomputes the four M-ratios, and derives both (i) 95 % confidence intervals for each M-ratio and their differences and (ii) the distribution of the Spearman correlation across 10,000 resamples. This analysis will be reported in parallel with the existing 10,000-bootstrap results for the training hypotheses, directly addressing the finite-sample objection while preserving the pre-registered structure of the study. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical measurements of observed profiles

full rationale

The paper reports direct empirical comparisons of M-ratio and Type-2 AUROC domain profiles computed from the same 3,000-question set evaluated at two precisions. The central claims rest on Spearman rank correlations and observed value shifts (e.g., domain order reversals) without any derivation, parameter fitting, or self-referential definition that reduces the reported statistics to the inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps; the pre-registered training hypotheses use explicit 10,000-bootstrap resampling, and the format-comparison results are presented as raw measurements. The work is therefore self-contained against external benchmarks and exhibits no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on standard signal-detection-theory assumptions that M-ratio and Type-2 AUROC validly capture metacognitive efficiency and discrimination in LLMs; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption M-ratio validly measures domain-level metacognitive efficiency
Central metric whose domain profiles are compared across formats.
domain assumption The four knowledge domains are fixed and comparable across model formats
Used to interpret profile shifts as restructuring rather than domain redefinition.

pith-pipeline@v0.9.0 · 5578 in / 1376 out tokens · 53617 ms · 2026-05-10T17:25:07.501635+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Verbal Confidence Saturation in 3-9B Open-Weight Instruction-Tuned LLMs: A Pre-Registered Psychometric Validity Screen
cs.CL 2026-04 conditional novelty 6.0

Seven 3-9B instruction-tuned LLMs produce verbal confidence that saturates at high values and fails psychometric validity criteria for Type-2 discrimination under minimal elicitation.
Distilling Self-Consistency into Verbal Confidence: A Pre-Registered Negative Result and Post-Hoc Rescue on Gemma 3 4B
cs.CL 2026-04 conditional novelty 5.0

Fine-tuning Gemma 3 4B on unfiltered self-consistency targets produces a binary verbal correctness discriminator with AUROC 0.774 on TriviaQA, outperforming logit entropy after a modal-filtered pre-registration failed.

Reference graph

Works this paper leans on

3 extracted references · 2 canonical work pages · cited by 2 Pith papers

[1]

Cacioli, J.-P. (2026a). Domain-specific metacognitive efficiency in large language models: A Type-2 signal detection theory analysis.arXiv:2603.25112. Cacioli, J.-P. (2026d). Before you interpret the profile: Validity scaling for LLM metacognitive self- report.Manuscript in preparation. Cacioli, J.-P. (2026e). Scalar variability in transformer language mo...

work page arXiv 2022
[2]

GPTQ:Accuratepost-trainingquantization for generative pre-trained transformers.ICLR

Frantar, E., Ashkboos, S., Hoefler, T., &Alistarh, D.(2023). GPTQ:Accuratepost-trainingquantization for generative pre-trained transformers.ICLR. Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. ICML. Legrand, N. (2022). metadpy: Metacognitive efficiency modelling in Python.https://github.com/ LegrandNico...

2023
[3]

Maniscalco, B., & Lau, H. (2012). A signal detection theoretic approach for estimating metacognitive sensitivity from confidence ratings.Consciousness and Cognition, 21(1), 422–430. Parikh, N., Sai, A., Shivaswamy, P., Panchal, K., & Lan, A. (2026). CATTO: Balancing preferences and confidence in language models.arXiv:2601.23096. Proskurina, I., Brun, L., ...

work page arXiv 2012