Spectral Perturbation of the Empirical Fisher Information Matrix under Weight Quantization

Hikmat Karimov; Rahid Zahid Alekberli

arxiv: 2606.28432 · v1 · pith:LE2STEL2new · submitted 2026-06-26 · 📊 stat.ML · cs.AI· cs.LG

Spectral Perturbation of the Empirical Fisher Information Matrix under Weight Quantization

Rahid Zahid Alekberli , Hikmat Karimov This is my paper

Pith reviewed 2026-06-30 01:35 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LG

keywords empirical fisher information matrixweight quantizationspectral perturbationeigenvalue boundsweyl inequalitylanguage model monitoringquantization noisecurvature monotonicity

0 comments

The pith

Quantization of model weights strictly increases the dominant eigenvalue of the empirical Fisher information matrix at leading order under a mild genericity condition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that input departure from a reference distribution elevates the largest eigenvalue lambda_max of the empirical FIM when a local curvature-monotonicity condition holds on that eigenvalue, and that finite-precision weight quantization does the same. The principal result applies Weyl's inequality to obtain a directional bound: lambda_max under quantization noise is at least as large as the unperturbed value up to a third-order term, and exceeds it strictly at leading order when a genericity condition is met. These spectral facts supply a concrete mechanism for an observed calibration gap in which a monitoring threshold for the normalized statistic sigma_t = lambda_max(F_t)/lambda_base was roughly 244 times larger on a 4-bit model than on its full-precision counterpart. Supporting measurements across twelve models and 1,080 trajectories are reported as consistent with the predicted elevation.

Core claim

Via Weyl's inequality the largest eigenvalue lambda_max of the empirical Fisher information matrix satisfies a lower bound under additive quantization noise that equals its unperturbed value up to a third-order remainder; under a mild genericity condition the inequality is strict at leading order, so lambda_max(F_quantized) > lambda_max(F_full) at the first non-vanishing term. A parallel elevation result holds for input departure from a reference ensemble once local curvature monotonicity of lambda_max is assumed. Two tractable approximations to lambda_max are derived together with a completeness statement for a threshold-based partition of an augmented state space.

What carries the argument

Weyl's inequality applied to the quantization-noise perturbation of the empirical FIM, yielding the directional bound on lambda_max.

If this is right

The normalized statistic sigma_t = lambda_max(F_t)/lambda_base supplies a runtime monitor whose calibration threshold must be raised for quantized models.
Two explicit approximations to lambda_max become available for real-time use, one heuristic and one with a rigorous two-sided guarantee.
A threshold-based partition of the augmented state space is exhaustive once the completeness result holds.
The observed 244-fold inflation of the calibration threshold on a 4-bit model is explained as a direct consequence of the quantization-induced spectral shift.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same Weyl argument could be applied to other structured perturbations such as pruning masks or low-rank adapters to predict their effect on lambda_max.
If the genericity condition fails on some layers, the strict elevation may reduce to a weak inequality, suggesting a diagnostic that checks the leading-order term on real weight tensors.
The monitoring statistic could be tested for early-warning power on out-of-distribution inputs whose departure is known to violate curvature monotonicity.
Closed-form prediction of the exact inflation magnitude remains open and could be pursued by refining the third-order remainder term.

Load-bearing premise

The local curvature-monotonicity hypothesis on lambda_max is required for the input-departure elevation and the mild genericity condition is required for the strict leading-order increase under quantization.

What would settle it

Compute lambda_max of the empirical FIM on the same trajectories before and after 4-bit quantization of a small transformer; the measured values either confirm or refute the strict elevation predicted by Theorem 4.3.

read the original abstract

We study the spectral perturbation of the empirical Fisher Information Matrix (FIM) of a parametric statistical model under two structured perturbations: departure of the input from a reference (in-distribution) ensemble, and finite-precision (quantized) perturbation of the model's parameters. For the first, under an explicit local curvature-monotonicity hypothesis on the dominant eigenvalue lambda_max of the FIM, we show departure from a reference manifold provably elevates lambda_max relative to a calibration baseline (Proposition 3.2), and discuss why this hypothesis is required, since curvature need not increase monotonically under every perturbation. Our principal result is a directional eigenvalue perturbation bound, via Weyl's inequality, showing lambda_max under a quantization noise perturbation is lower bounded by its unperturbed value up to a third-order remainder, and, under a mild genericity condition, strictly exceeds it at leading order (Theorem 4.3). We give two tractable approximations to lambda_max -- one heuristic, one with a rigorous two-sided bound -- and a completeness result for a threshold-based partition of an augmented state space. These results motivate using sigma_t = lambda_max(F_t)/lambda_base as a runtime monitoring statistic for deployed language models: the quantization result offers a mechanism for an empirical observation of our own, where a calibration threshold for this statistic was approximately 244 times larger than a preliminary full-precision estimate on a 4-bit quantized model, a single measurement rather than a value derived in closed form. We report supporting measurements (twelve models, n=1,080 trajectories) broadly consistent with our predictions, discuss the scope and limitations of every result, and state as an open problem the closed-form prediction of the quantization inflation magnitude our bound does not supply.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main new piece is a Weyl-based directional bound showing that quantization noise raises lambda_max of the empirical FIM at leading order under a genericity condition, plus a proposed runtime monitor; the input-shift claim rests on a restrictive monotonicity hypothesis.

read the letter

The central new result is Theorem 4.3, which applies Weyl's inequality to the quantization perturbation on the empirical FIM and obtains a lower bound on lambda_max with a third-order remainder; under a mild genericity condition the inequality is strict at leading order. They also give Proposition 3.2 on input departure elevating lambda_max when a local curvature-monotonicity condition holds on the dominant eigenvalue. Both are presented with explicit statements of the needed assumptions. The work supplies two approximations to lambda_max and reports measurements across twelve models and 1080 trajectories that line up with the directional prediction, including one case where the observed inflation reached roughly 244 times a full-precision baseline.

The application of Weyl to this structured perturbation is not routine, and the authors are direct about the limits: the monotonicity hypothesis is not universal, and they leave the exact size of the quantization inflation as an open problem rather than claiming a closed-form prediction from the bound. The 244x figure is treated as an empirical observation, not something derived from the inequality. The remainder term means the bound is not yet tight enough for quantitative forecasting.

This is narrow but cleanly executed work on a specific monitoring statistic for quantized models. It will mainly interest people already working on FIM-based diagnostics or post-training quantization checks. The formal steps look standard once the assumptions are granted, and the experiments are consistent with the claims without overclaiming generality.

I would send it to referees. The theorems are stated clearly enough that a reviewer can check the derivations and the scope of the genericity condition, and the data provide a starting point for further tests even if the effect size remains open.

Referee Report

2 major / 2 minor

Summary. The manuscript studies spectral perturbations of the empirical Fisher Information Matrix (FIM) under two structured changes: departure of inputs from a reference ensemble, and finite-precision quantization of model weights. Under an explicit local curvature-monotonicity hypothesis on the dominant eigenvalue lambda_max, input departure is shown to elevate lambda_max relative to a calibration baseline (Proposition 3.2). The principal result applies Weyl's inequality to obtain a directional bound showing that quantization noise lower-bounds lambda_max up to a third-order remainder and, under a mild genericity condition, strictly exceeds the unperturbed value at leading order (Theorem 4.3). The authors propose the ratio sigma_t = lambda_max(F_t)/lambda_base as a runtime monitoring statistic for quantized language models, motivated by an empirical 244x inflation observation on a 4-bit model, and report supporting measurements across twelve models and 1,080 trajectories that are broadly consistent with the predictions.

Significance. If the central bounds hold, the work supplies a rigorous perturbation analysis, grounded in Weyl's inequality, for the dominant eigenvalue of the empirical FIM under quantization—an operationally relevant setting for deployed models. The explicit statement and discussion of the curvature-monotonicity hypothesis, together with the acknowledgment that the exact inflation magnitude remains an open problem, adds transparency. The empirical consistency across multiple models provides practical corroboration for the proposed monitoring statistic, though the result is primarily theoretical.

major comments (2)

[Proposition 3.2] Proposition 3.2: the elevation claim for input departure rests entirely on the local curvature-monotonicity hypothesis; while the manuscript correctly notes that the hypothesis is not universal, the scope of perturbations and model classes for which it holds is not further delimited, leaving the practical reach of the input-departure result dependent on an assumption whose verification is external to the derivation.
[Theorem 4.3] Theorem 4.3: the third-order remainder in the Weyl-based bound prevents any closed-form prediction of the observed inflation magnitude (explicitly stated as an open problem in the abstract); because the 244x factor is presented only as a measurement rather than a consequence of the leading-order term, the theorem establishes the directional inequality but does not quantitatively account for the scale of the empirical effect that motivates the monitoring statistic.

minor comments (2)

The experimental section (referenced via the 1,080 trajectories) does not appear to tabulate the precise model architectures, quantization bit-widths, or task distributions; adding a compact table or explicit cross-reference would allow readers to assess the breadth of the consistency claim.
Notation for the monitoring statistic sigma_t and the baseline lambda_base is introduced late; defining these quantities with a forward reference in the introduction would improve readability for readers primarily interested in the application.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the recommendation of minor revision. We address the two major comments point by point below.

read point-by-point responses

Referee: [Proposition 3.2] Proposition 3.2: the elevation claim for input departure rests entirely on the local curvature-monotonicity hypothesis; while the manuscript correctly notes that the hypothesis is not universal, the scope of perturbations and model classes for which it holds is not further delimited, leaving the practical reach of the input-departure result dependent on an assumption whose verification is external to the derivation.

Authors: We agree that the local curvature-monotonicity hypothesis is essential to Proposition 3.2 and that its domain of applicability is not exhaustively mapped. In the revised manuscript we will insert a short additional paragraph after the proposition that supplies concrete examples (linear models, small-norm input shifts in overparameterized networks) where the hypothesis can be verified directly, while reiterating that verification remains external for arbitrary models. revision: partial
Referee: [Theorem 4.3] Theorem 4.3: the third-order remainder in the Weyl-based bound prevents any closed-form prediction of the observed inflation magnitude (explicitly stated as an open problem in the abstract); because the 244x factor is presented only as a measurement rather than a consequence of the leading-order term, the theorem establishes the directional inequality but does not quantitatively account for the scale of the empirical effect that motivates the monitoring statistic.

Authors: The observation is correct: Theorem 4.3 supplies only a directional bound whose remainder precludes a closed-form magnitude, and the 244x factor is presented strictly as an empirical observation. The manuscript already states the magnitude prediction as an open problem. No revision is required. revision: no

Circularity Check

0 steps flagged

No significant circularity

full rationale

The central bound (Theorem 4.3) is obtained by direct application of Weyl's inequality to the quantization perturbation of the empirical FIM, producing a lower bound on lambda_max with an explicit third-order remainder and a strict inequality under a stated mild genericity condition. Proposition 3.2 rests on an explicitly declared local curvature-monotonicity hypothesis whose necessity is discussed in the text. The 244x factor is presented as an empirical measurement on a quantized model, not as a closed-form prediction derived from the bound. No self-definitional reductions, fitted inputs renamed as predictions, load-bearing self-citations, or ansatzes smuggled via prior work appear in the derivation chain. The argument is self-contained against the external Weyl inequality and the paper's own stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper relies on two domain-specific hypotheses whose validity is not derived from first principles and on the standard Weyl inequality; no new entities are introduced.

axioms (2)

domain assumption Local curvature-monotonicity hypothesis on lambda_max of the FIM
Invoked to guarantee that input departure elevates lambda_max (Proposition 3.2)
domain assumption Mild genericity condition
Required for the strict leading-order increase under quantization (Theorem 4.3)

pith-pipeline@v0.9.1-grok · 5852 in / 1317 out tokens · 39829 ms · 2026-06-30T01:35:36.843343+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

8 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Amari,Information Geometry and Its Applications

S. Amari,Information Geometry and Its Applications. Tokyo: Springer, 2016

2016
[2]

New insights and perspectives on the natural gradient method,

J. Martens, “New insights and perspectives on the natural gradient method,”Journal of Machine Learning Research, vol. 21, no. 146, pp. 1–76, 2020

2020
[3]

A practical Bayesian framework for backpropagation networks,

D. J. C. MacKay, “A practical Bayesian framework for backpropagation networks,”Neural Com- putation, vol. 4, no. 3, pp. 448–472, 1992

1992
[4]

The spectrum of the Fisher information matrix of a single-hidden- layer neural network,

J. Pennington and P. Worah, “The spectrum of the Fisher information matrix of a single-hidden- layer neural network,” inAdvances in Neural Information Processing Systems, vol. 31, 2018, pp. 5410–5419

2018
[5]

A random matrix approach to neural networks,

C. Louart, Z. Liao, and R. Couillet, “A random matrix approach to neural networks,”Annals of Applied Probability, vol. 28, no. 2, pp. 1190–1248, 2018

2018
[6]

Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

L. Sagun, L. Bottou, and Y. LeCun, “Eigenvalues of the Hessian in deep learning: Singularity and beyond,”arXiv preprintarXiv:1611.07476, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[7]

Eigenvalue distribution of some nonlinear models of random matrices,

L. Benigni and S. Péché, “Eigenvalue distribution of some nonlinear models of random matrices,” Electronic Journal of Probability, vol. 26, paper no. 150, 2021

2021
[8]

Largest eigenvalues of the conjugate kernel of single-layered neural networks,

L. Benigni and S. Péché, “Largest eigenvalues of the conjugate kernel of single-layered neural networks,”arXiv preprintarXiv:2201.04753, 2022. 13

work page arXiv 2022

[1] [1]

Amari,Information Geometry and Its Applications

S. Amari,Information Geometry and Its Applications. Tokyo: Springer, 2016

2016

[2] [2]

New insights and perspectives on the natural gradient method,

J. Martens, “New insights and perspectives on the natural gradient method,”Journal of Machine Learning Research, vol. 21, no. 146, pp. 1–76, 2020

2020

[3] [3]

A practical Bayesian framework for backpropagation networks,

D. J. C. MacKay, “A practical Bayesian framework for backpropagation networks,”Neural Com- putation, vol. 4, no. 3, pp. 448–472, 1992

1992

[4] [4]

The spectrum of the Fisher information matrix of a single-hidden- layer neural network,

J. Pennington and P. Worah, “The spectrum of the Fisher information matrix of a single-hidden- layer neural network,” inAdvances in Neural Information Processing Systems, vol. 31, 2018, pp. 5410–5419

2018

[5] [5]

A random matrix approach to neural networks,

C. Louart, Z. Liao, and R. Couillet, “A random matrix approach to neural networks,”Annals of Applied Probability, vol. 28, no. 2, pp. 1190–1248, 2018

2018

[6] [6]

Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

L. Sagun, L. Bottou, and Y. LeCun, “Eigenvalues of the Hessian in deep learning: Singularity and beyond,”arXiv preprintarXiv:1611.07476, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[7] [7]

Eigenvalue distribution of some nonlinear models of random matrices,

L. Benigni and S. Péché, “Eigenvalue distribution of some nonlinear models of random matrices,” Electronic Journal of Probability, vol. 26, paper no. 150, 2021

2021

[8] [8]

Largest eigenvalues of the conjugate kernel of single-layered neural networks,

L. Benigni and S. Péché, “Largest eigenvalues of the conjugate kernel of single-layered neural networks,”arXiv preprintarXiv:2201.04753, 2022. 13

work page arXiv 2022