pith. sign in

arxiv: 2606.04767 · v1 · pith:UD72MFPTnew · submitted 2026-06-03 · 💻 cs.LG · cs.CV

Measuring Model Robustness via Fisher Information: Spectral Bounds, Theoretical Guarantees, and Practical Algorithms

Pith reviewed 2026-06-28 07:22 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords Fisher Information Matrixrobustness metricspectral normadversarial vulnerabilitydeep neural networkstheoretical boundsJacobian varianceattack-agnostic evaluation
0
0 comments X

The pith

The spectral norm of the Fisher Information Matrix quantifies a neural network's worst-case sensitivity to input perturbations and correlates with adversarial vulnerability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces the spectral norm of the Fisher Information Matrix as an attack-independent metric for assessing the robustness of deep neural networks. It shows that this norm captures the maximum sensitivity of the output distribution to perturbations in the input. The authors derive closed-form expressions for this norm in standard architectures such as VGG, ResNet, DenseNet, and Transformer, enabling theoretical comparisons. Experiments demonstrate that the metric aligns closely with how vulnerable models are to adversarial attacks across image datasets and medical images. This approach offers a way to understand and compare model sensitivities without depending on specific attack strategies.

Core claim

The paper establishes that the Fisher Information Matrix equals the variance of the input Jacobian, its spectral norm quantifies worst-case sensitivity of the output distribution to input perturbations, closed-form spectral bounds exist for VGG, ResNet, DenseNet and Transformer to provide the first theoretical robustness ranking, and scalable algorithms based on power iteration and Hutchinson estimation confirm strong correlation with adversarial vulnerability on CIFAR, ImageNet and medical images.

What carries the argument

The spectral norm of the Fisher Information Matrix, which equals the variance of the input Jacobian and bounds the worst-case sensitivity of output probabilities to input perturbations.

If this is right

  • Lower spectral norm of the FIM indicates greater robustness to adversarial perturbations across architectures.
  • Closed-form bounds enable direct theoretical ranking of VGG, ResNet, DenseNet and Transformer without attack-based testing.
  • Power iteration and Hutchinson-based algorithms support efficient white-box and black-box evaluation at scale.
  • The metric acts as a diagnostic tool that highlights architectural sources of sensitivity and guides robust design.
  • Empirical correlation holds across CIFAR, ImageNet and medical image datasets for multiple model families.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The metric could be applied during architecture search to favor designs with lower predicted sensitivity before full training.
  • It might serve as a proxy for robustness under distribution shifts other than adversarial attacks, such as common corruptions.
  • Training objectives that penalize high FIM spectral norm could be tested as a direct route to improved robustness.
  • The Jacobian-variance view might extend the analysis to sequence models or reinforcement learning policies.

Load-bearing premise

The equality between the Fisher Information Matrix and the variance of the input Jacobian, along with the closed-form spectral bounds, translates into a reliable ranking of real-world robustness without hidden assumptions on data distribution or perturbation model.

What would settle it

An experiment on one of the tested datasets showing a model with low FIM spectral norm that remains highly vulnerable to standard adversarial attacks, or a high-norm model that exhibits unusually high robustness.

Figures

Figures reproduced from arXiv: 2606.04767 by Chong Zhang, Jia Wang, Qiufeng Wang, Xiang Li, Xiaobo Jin.

Figure 1
Figure 1. Figure 1: Overview of our work: motivation and contributions. sensitive, and unable to reveal intrinsic model properties. Theoretically-derived bounds (e.g., Lipschitz constants (Shi et al., 2022) and CLEVER scores (Weng et al., 2018)), while attack-agnostic, often lack probabilistic interpretation and are difficult to scale to modern architectures like Trans￾formers (Vaswani et al., 2023). Crucially, neither paradi… view at source ↗
read the original abstract

The robustness of deep neural networks is crucial for safety-critical deployments, yet existing evaluation methods are often attack-dependent and lack interpretability. We propose a principled, attack-agnostic robustness metric based on the spectral norm of the Fisher Information Matrix (FIM), which quantifies the worst-case sensitivity of the model's output distribution to input perturbations. Theoretically, we establish that the FIM equals the variance of the input Jacobian and derive closed-form spectral bounds for common architectures, including VGG, ResNet, DenseNet, and Transformer, providing the first theoretical robustness ranking. To enable scalable evaluation, we develop efficient algorithms, including power iteration and Hutchinson-based estimation, that support both white-box and black-box settings. Extensive experiments across multiple datasets, including CIFAR, ImageNet, and medical images, and across multiple architectures show a strong correlation between our metric and adversarial vulnerability. Our framework serves as an interpretable diagnostic tool that complements attack-based evaluations, offering insights into architectural sensitivity and guiding the design of more robust models. Code is available at: https://github.com/franz-chang/SRP/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper proposes the spectral norm of the Fisher Information Matrix (FIM) as an attack-agnostic robustness metric for deep neural networks. It claims that the FIM equals the variance of the input Jacobian, derives closed-form spectral bounds for VGG, ResNet, DenseNet, and Transformer architectures, develops efficient estimation algorithms (power iteration and Hutchinson-based) for white- and black-box settings, and reports strong empirical correlation between the metric and adversarial vulnerability across CIFAR, ImageNet, and medical imaging datasets.

Significance. If the central equality and bounds hold, the work supplies a principled, interpretable diagnostic that complements attack-based evaluations and could guide architecture design. Explicit credit is due for the closed-form bounds on standard architectures, the provision of public code, and the attack-agnostic framing. The significance is tempered by the need for the metric to produce reliable rankings under the perturbation model without additional modeling assumptions on the data distribution.

minor comments (3)
  1. [§3] The abstract states that the FIM 'equals the variance of the input Jacobian' but the manuscript should clarify in §3 whether this equality is exact or holds in expectation, and whether it requires any distributional assumptions on the inputs.
  2. [Experiments section] Table 2 (or equivalent experimental table) reports correlation coefficients; adding the number of models and datasets per correlation would help readers assess the strength of the ranking claim.
  3. [Algorithm section] The black-box estimation procedure is described at a high level; a short pseudocode block or explicit complexity statement would improve reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work, the recognition of the closed-form bounds, public code, and attack-agnostic framing, and the recommendation for minor revision. No specific major comments were enumerated in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation chain begins with the definition of the spectral norm of the FIM as the robustness metric and proceeds via a stated theoretical equality to the variance of the input Jacobian (presented as derived, not assumed by construction) followed by closed-form spectral bounds for specific architectures. These steps are followed by algorithm development and empirical correlation checks across datasets and models. No load-bearing step reduces by the paper's own equations to a fitted parameter renamed as prediction, a self-citation chain, or an ansatz smuggled via prior work. The central claim remains independent of its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The central claim rests on standard properties of the Fisher Information Matrix and the assumption that its spectral norm captures worst-case input sensitivity.

pith-pipeline@v0.9.1-grok · 5733 in / 1119 out tokens · 18684 ms · 2026-06-28T07:22:02.948759+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

128 extracted references · 21 canonical work pages · 14 internal anchors

  1. [1]

    2022 , eprint=

    Efficiently Computing Local Lipschitz Constants of Neural Networks via Bound Propagation , author=. 2022 , eprint=

  2. [2]

    2023 , eprint=

    Attention Is All You Need , author=. 2023 , eprint=

  3. [3]

    2020 , eprint=

    Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , author=. 2020 , eprint=

  4. [4]

    Adversarial Robustness with Partial Isometry , volume =

    Shi-Garrier, Loïc and Bouaynaya, Nidhal and Delahaye, Daniel , year =. Adversarial Robustness with Partial Isometry , volume =. Entropy , doi =

  5. [5]

    2019 , eprint=

    Defending Against Adversarial Attacks by Suppressing the Largest Eigenvalue of Fisher Information Matrix , author=. 2019 , eprint=

  6. [6]

    Proceedings of the 37th International Conference on Machine Learning , articleno =

    Croce, Francesco and Hein, Matthias , title =. Proceedings of the 37th International Conference on Machine Learning , articleno =. 2020 , publisher =

  7. [7]

    2019 , eprint=

    Certified Adversarial Robustness via Randomized Smoothing , author=. 2019 , eprint=

  8. [8]

    2019 , eprint=

    Inspecting adversarial examples using the Fisher information , author=. 2019 , eprint=

  9. [9]

    2019 , eprint=

    The Adversarial Attack and Detection under the Fisher Information Metric , author=. 2019 , eprint=

  10. [10]

    ACM Comput

    Fawole, Oluwajuwon and Rawat, Danda , title =. ACM Comput. Surv. , numpages =. 2025 , publisher =

  11. [11]

    2017 , eprint=

    Spectrally-normalized margin bounds for neural networks , author=. 2017 , eprint=

  12. [12]

    and Loan, Charles F

    Golub, Gene H. and Loan, Charles F. Van , year =. Matrix

  13. [13]

    2022 , eprint=

    Estimating Example Difficulty Using Variance of Gradients , author=. 2022 , eprint=

  14. [14]

    2020 , archivePrefix=

    Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp Adversarial Attacks , author=. 2020 , archivePrefix=

  15. [15]

    Madry, Aleksander and Makelov, Aleksandar and Schmidt, Ludwig and Tsipras, Dimitris and Vladu, Adrian , year =. Towards

  16. [16]

    Carlini, Nicholas and Wagner, David , year =. Towards

  17. [17]

    Intriguing properties of neural networks , publisher =

    Szegedy, Christian and Zaremba, Wojciech and Sutskever, Ilya and Bruna, Joan and Erhan, Dumitru and Goodfellow, Ian and Fergus, Rob , year =. Intriguing properties of neural networks , publisher =

  18. [18]

    arXiv.org , author =

    Robust. arXiv.org , author =

  19. [19]

    2019 , eprint=

    Deep Variational Information Bottleneck , author=. 2019 , eprint=

  20. [20]

    Neural Computation , author =

    Natural. Neural Computation , author =. 1998 , pages =

  21. [21]

    Optimizing

    Martens, James and Grosse, Roger , year =. Optimizing

  22. [22]

    Pope, Phillip and Zhu, Chen and Abdelkader, Ahmed and Goldblum, Micah and Goldstein, Tom , year =. The

  23. [23]

    Spectral

    Miyato, Takeru and Kataoka, Toshiki and Koyama, Masanori and Yoshida, Yuichi , year =. Spectral

  24. [24]

    Communications in Statistics - Simulation and Computation , author =

    A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , volume =. Communications in Statistics - Simulation and Computation , author =. 1990 , pages =

  25. [25]

    Digital Medicine , author =

    Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis , volume =. Digital Medicine , author =. 2021 , pages =

  26. [26]

    and Schechtman, Gideon , year =

    Milman, Vitali D. and Schechtman, Gideon , year =. Asymptotic

  27. [27]

    Toeplitz

    Grenander, Ulf and Szego, Gabor , year =. Toeplitz. Toeplitz

  28. [28]

    Davis, P. J. , year =. Circulant matrices , isbn =

  29. [29]

    , year =

    Bai, Zhidong and Silverstein, Jack W. , year =. Spectral

  30. [30]

    2023 , eprint=

    Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A Contemporary Survey , author=. 2023 , eprint=

  31. [31]

    Basic Iterative Methods , isbn =

    Ford, William , year =. Basic Iterative Methods , isbn =

  32. [32]

    2015 , eprint=

    The Limitations of Deep Learning in Adversarial Settings , author=. 2015 , eprint=

  33. [33]

    2016 , eprint=

    DeepFool: a simple and accurate method to fool deep neural networks , author=. 2016 , eprint=

  34. [34]

    2017 , eprint=

    Towards Evaluating the Robustness of Neural Networks , author=. 2017 , eprint=

  35. [35]

    2018 , booktitle =

    Xiao, Chaowei and Li, Bo and Zhu, Jun-Yan and He, Warren and Liu, Mingyan and Song, Dawn , title =. 2018 , booktitle =

  36. [36]

    Adversarial Adaptive Neighborhood With Feature Importance-Aware Convex Interpolation , year=

    Li, Qian and Qi, Yong and Hu, Qingyuan and Qi, Saiyu and Lin, Yun and Dong, Jin Song , journal=. Adversarial Adaptive Neighborhood With Feature Importance-Aware Convex Interpolation , year=

  37. [37]

    ICLR 2022 , archivePrefix=

    Self-Ensemble Adversarial Training for Improved Robustness , author=. ICLR 2022 , archivePrefix=. 2022 , eprint=

  38. [38]

    ICLR , year=

    Improving Adversarial Robustness Requires Revisiting Misclassified Examples , author=. ICLR , year=

  39. [39]

    IJCAI , archivePrefix=

    Curriculum Adversarial Training , author=. IJCAI , archivePrefix=. 2018 , eprint=

  40. [40]

    ICML , archivePrefix=

    Attacks Which Do Not Kill Training Make Adversarial Learning Stronger , author=. ICML , archivePrefix=. 2020 , eprint=

  41. [41]

    Yao Li, Minhao Cheng, Cho-Jui Hsieh and Thomas C. M. Lee , title =. The American Statistician , volume =

  42. [42]

    2022 , issn =

    A survey of robust adversarial training in pattern recognition: Fundamental, theory, and methodologies , journal =. 2022 , issn =

  43. [43]

    2017 , eprint=

    Adversarial Attacks on Neural Network Policies , author=. 2017 , eprint=

  44. [44]

    CAAI Transactions on Intelligence Technology , pages =

    Chakraborty, Anirban and Alam, Manaar and Dey, Vishal and Chattopadhyay, Anupam and Mukhopadhyay, Debdeep , title =. CAAI Transactions on Intelligence Technology , pages =. 2021 , volume =

  45. [45]

    Applied Artificial Intelligence , volume=

    Towards autonomous driving model resistant to adversarial attack , author=. Applied Artificial Intelligence , volume=. 2023 , publisher=

  46. [46]

    2022 , issn =

    Adversarial training of LSTM-ED based anomaly detection for complex time-series in cyber-physical-social systems , journal =. 2022 , issn =

  47. [47]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  48. [48]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Densely connected convolutional networks , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  49. [49]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    An image is worth 16x16 words: Transformers for image recognition at scale , author=. arXiv preprint arXiv:2010.11929 , year=

  50. [50]

    Proceedings of the IEEE , volume=

    Gradient-based learning applied to document recognition , author=. Proceedings of the IEEE , volume=. 1998 , publisher=

  51. [51]

    Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

    Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms , author=. arXiv preprint arXiv:1708.07747 , year=

  52. [52]

    2009 , institution=

    Learning multiple layers of features from tiny images , author=. 2009 , institution=

  53. [53]

    Computer vision and pattern recognition workshop , volume=

    Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories , author=. Computer vision and pattern recognition workshop , volume=. 2004 , publisher=

  54. [54]

    2007 , institution=

    Caltech-256 object category dataset , author=. 2007 , institution=

  55. [55]

    European conference on computer vision , pages=

    Microsoft coco: Common objects in context , author=. European conference on computer vision , pages=. 2014 , organization=

  56. [56]

    Advances in neural information processing systems , pages=

    Imagenet classification with deep convolutional neural networks , author=. Advances in neural information processing systems , pages=

  57. [57]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Very deep convolutional networks for large-scale image recognition , author=. arXiv preprint arXiv:1409.1556 , year=

  58. [58]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Going deeper with convolutions , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  59. [59]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Rich feature hierarchies for accurate object detection and semantic segmentation , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  60. [60]

    Proceedings of the IEEE international conference on computer vision , pages=

    Fast r-cnn , author=. Proceedings of the IEEE international conference on computer vision , pages=

  61. [61]

    Advances in neural information processing systems , pages=

    Faster r-cnn: Towards real-time object detection with region proposal networks , author=. Advances in neural information processing systems , pages=

  62. [62]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    You only look once: Unified, real-time object detection , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  63. [63]

    European conference on computer vision , pages=

    Ssd: Single shot multibox detector , author=. European conference on computer vision , pages=. 2016 , organization=

  64. [64]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Swin transformer: Hierarchical vision transformer using shifted windows , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  65. [65]

    European conference on computer vision , pages=

    End-to-end object detection with transformers , author=. European conference on computer vision , pages=. 2020 , organization=

  66. [66]

    Adversarial examples in the physical world

    Adversarial examples in the physical world , author=. arXiv preprint arXiv:1607.02533 , year=

  67. [67]

    Proceedings of the IEEE conference on computer vision and pattern recognition , year=

    Universal adversarial perturbations , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , year=

  68. [68]

    Synthesizing Robust Adversarial Examples

    Synthesizing Robust Adversarial Examples , author=. arXiv preprint arXiv:1707.07397 , year=

  69. [69]

    Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

    Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , author=. arXiv preprint arXiv:1605.07277 , year=

  70. [70]

    Explaining and Harnessing Adversarial Examples

    Explaining and harnessing adversarial examples , author=. arXiv preprint arXiv:1412.6572 , year=

  71. [71]

    Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

    Feature squeezing: Detecting adversarial examples in deep neural networks , author=. arXiv preprint arXiv:1704.01155 , year=

  72. [72]

    2016 IEEE Symposium on Security and Privacy (SP) , pages=

    Distillation as a defense to adversarial perturbations against deep neural networks , author=. 2016 IEEE Symposium on Security and Privacy (SP) , pages=. 2016 , organization=

  73. [73]

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

    Defense against adversarial attacks using high-level representation guided denoiser , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

  74. [74]

    Towards Deep Learning Models Resistant to Adversarial Attacks

    Towards deep learning models resistant to adversarial attacks , author=. arXiv preprint arXiv:1706.06083 , year=

  75. [75]

    Advances in neural information processing systems , volume=

    Adversarial training for free! , author=. Advances in neural information processing systems , volume=

  76. [76]

    Advances in neural information processing systems , year=

    You only propagate once: Accelerating adversarial training via maximal principle , author=. Advances in neural information processing systems , year=

  77. [77]

    arXiv preprint arXiv:1705.07204 , year=

    Ensemble adversarial training: Attacks and defenses , author=. arXiv preprint arXiv:1705.07204 , year=

  78. [78]

    Advances in neural information processing systems , year=

    Defense against adversarial attacks using feature scattering-based adversarial training , author=. Advances in neural information processing systems , year=

  79. [79]

    8th International Conference on Learning Representations (ICLR 2020)(virtual) , year=

    Adversarial training and provable defenses: Bridging the gap , author=. 8th International Conference on Learning Representations (ICLR 2020)(virtual) , year=

  80. [80]

    Advances in Neural Information Processing Systems , volume=

    Fast certified robust training with short warmup , author=. Advances in Neural Information Processing Systems , volume=

Showing first 80 references.