pith. sign in

arxiv: 1906.09023 · v2 · pith:5YREE2MWnew · submitted 2019-06-21 · 💻 cs.LG · cs.CV· stat.ML

Backpropagation-Friendly Eigendecomposition

Pith reviewed 2026-05-25 19:12 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML
keywords eigendecompositionbackpropagationdeep networksZCA whiteningPCA denoisingnumerical stabilitynormalizationcovariance matrix
0
0 comments X

The pith

A new eigendecomposition approach makes backpropagation stable for large matrices in deep networks without splitting the data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard eigendecomposition and its power-iteration approximation produce unstable gradients when used inside neural networks, particularly with large matrices. Partitioning matrices into small arbitrary groups avoids the instability but has no theoretical justification and prevents full exploitation of eigendecomposition. The paper presents a differentiable formulation that computes stable gradients for eigenvectors and eigenvalues directly on large matrices. The method is demonstrated on ZCA whitening as a batch-normalization alternative and on a new PCA-based feature denoising step. If the approach works as claimed, networks can incorporate full eigendecomposition operations during training without custom partitioning or gradient clipping.

Core claim

The authors introduce a numerically stable and differentiable eigendecomposition procedure that operates on large matrices without partitioning and yields reliable gradients for backpropagation, outperforming both direct eigendecomposition and power iteration on ZCA whitening and the newly introduced PCA denoising normalization.

What carries the argument

Numerically stable differentiable eigendecomposition that computes eigenvector gradients without matrix partitioning.

If this is right

  • Networks can apply eigendecomposition directly to full-size feature covariance matrices rather than arbitrary partitions.
  • ZCA whitening layers become more robust to numerical issues than those using standard ED or power iteration.
  • PCA denoising can serve as a new feature normalization layer that further reduces noise beyond batch normalization.
  • The full expressive power of eigenvector-based operations becomes available inside differentiable pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same stability technique could be reused for other spectral layers that rely on eigenvectors, such as certain graph convolutions or whitening-based regularizers.
  • Training dynamics might change in models that currently avoid eigendecomposition because of gradient instability.
  • The method might allow direct differentiation through operations like orthogonalization or principal-component projection without auxiliary losses.

Load-bearing premise

The proposed eigendecomposition remains numerically stable and produces accurate gradients when applied to the matrix sizes that arise during end-to-end training of deep networks.

What would settle it

Applying the method to a covariance matrix whose size exceeds the largest matrix tested in the ZCA or PCA experiments and observing NaN values or exploding gradients during backpropagation would falsify the stability claim.

Figures

Figures reproduced from arXiv: 1906.09023 by Mathieu Salzmann, Pascal Fua, Wei Wang, Yinlin Hu, Zheng Dang.

Figure 1
Figure 1. Figure 1: (a) shows how the value of (λk/λ1) k changes w.r.t. the eigenvalue ratio λk/λ1 and iteration number k. (b) shows the contour of curved surface in (a). λi/λ1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.85 0.9 0.95 0.99 0.995 0.999 k = dln(0.01)/ln(λi/λ1)e 2 2 3 4 5 6 9 14 19 29 59 299 598 2995 [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance of our PCA denoising layer as a function of (a) the percentage of preserved Information, (b) the number e of eigenvectors retained, (c) the number of power iterations. 3. For our algorithm, we never saw a training failure cases independently of the matrix dimension ranging from d = 4 to d = 64. In [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Training loss as a function of the number of epochs. Methods Error Value BN Min 4.66 Mean 4.81±0.19 PCA(PI) Min 5.05 Mean 5.35±0.25 PCA(SVD) Min NaN Mean NaN PCA(Ours) Min 4.58 Mean 4.63±0.11 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Eigendecomposition (ED) is widely used in deep networks. However, the backpropagation of its results tends to be numerically unstable, whether using ED directly or approximating it with the Power Iteration method, particularly when dealing with large matrices. While this can be mitigated by partitioning the data in small and arbitrary groups, doing so has no theoretical basis and makes its impossible to exploit the power of ED to the full. In this paper, we introduce a numerically stable and differentiable approach to leveraging eigenvectors in deep networks. It can handle large matrices without requiring to split them. We demonstrate the better robustness of our approach over standard ED and PI for ZCA whitening, an alternative to batch normalization, and for PCA denoising, which we introduce as a new normalization strategy for deep networks, aiming to further denoise the network's features.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper claims to introduce a numerically stable, differentiable eigendecomposition procedure suitable for backpropagation through large matrices in deep networks, without requiring arbitrary partitioning. It applies the method to ZCA whitening (as an alternative to batch normalization) and introduces PCA denoising as a new normalization layer, reporting improved robustness over direct ED and Power Iteration in these settings.

Significance. If the stability and differentiability claims hold for large matrices, the work would enable fuller use of spectral methods inside end-to-end training without ad-hoc splits, which could strengthen normalization and feature-processing modules that rely on eigenvectors.

minor comments (2)
  1. [Abstract] Abstract: the claim of 'better robustness' is stated without any quantitative metrics, dataset names, or matrix sizes; moving even one concrete result into the abstract would strengthen the summary.
  2. The manuscript should clarify whether the new PCA-denoising normalization requires any additional hyperparameters beyond the standard PCA rank or eigenvalue threshold.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the work and the recommendation for minor revision. We are glad that the potential to enable fuller use of spectral methods in end-to-end training is recognized.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a new differentiable eigendecomposition procedure claimed to be numerically stable for large matrices. No load-bearing step reduces by construction to a fitted input, self-definition, or self-citation chain. The derivation of the method (whatever its explicit form) is offered as independent content rather than a renaming or tautological restatement of its inputs. The empirical demonstrations on ZCA and PCA denoising are presented as validation, not as the source of the claimed properties. This is the common case of a self-contained technical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities identifiable from the abstract alone.

pith-pipeline@v0.9.0 · 5673 in / 942 out tokens · 21586 ms · 2026-05-25T19:12:00.877531+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

  1. [1]

    Eigenfaces for recognition.Journal of cognitive neuroscience, 3(1):71–86, 1991

    Matthew Turk and Alex Pentland. Eigenfaces for recognition.Journal of cognitive neuroscience, 3(1):71–86, 1991

  2. [2]

    Natural neural networks

    Guillaume Desjardins, Karen Simonyan, Razvan Pascanu, et al. Natural neural networks. 2015

  3. [3]

    Decorrelated batch normalization

    Lei Huang, Dawei Yang, Bo Lang, and Jia Deng. Decorrelated batch normalization. In CVPR, 2018

  4. [4]

    Matrix Backpropagation for Deep Networks with Structured Layers

    Catalin Ionescu, Orestis Vantzos, and Cristian Sminchisescu. Matrix Backpropagation for Deep Networks with Structured Layers. In CVPR, 2015

  5. [5]

    Spectral normalization for generative adversarial networks

    Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. 2018

  6. [6]

    Suwajanakorn, N

    S. Suwajanakorn, N. Snavely, J. Tompson, and M. Norouzi. Discovery of Latent 3D Keypoints via End-To-End Geometric Reasoning. In NIPS, 2018

  7. [7]

    K. M. Yi, E. Trulls, Y . Ono, V . Lepetit, M. Salzmann, and P. Fua. Learning to Find Good Correspondences. In CVPR, 2018

  8. [8]

    Ranftl and V

    R. Ranftl and V . Koltun. Deep Fundamental Matrix Estimation. In ECCV, 2018

  9. [9]

    Eigendecomposition-Free Training of Deep Networks with Zero Eigenvalue-Based Losses

    Zheng Dang, Kwang Moo Yi, Yinlin Hu, Fei Wang, Pascal Fua, and Mathieu Salzmann. Eigendecomposition-Free Training of Deep Networks with Zero Eigenvalue-Based Losses. In ECCV, 2018

  10. [10]

    Papadopoulo and M

    T. Papadopoulo and M. Lourakis. Estimating the jacobian of the singular value decomposition: Theory and applications. In ECCV, pages 554–570, 2000

  11. [11]

    Deep Learning of Graph Matching

    Andrei Zanfir and Cristian Sminchisescu. Deep Learning of Graph Matching. In CVPR, 2018

  12. [12]

    Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the svd

    Yuji Nakatsukasa and Nicholas J Higham. Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the svd. SIAM Journal on Scientific Computing, 35(3):A1325–A1349, 2013

  13. [13]

    Burden and J

    Richard L. Burden and J. Douglas Faires. Numerical Analysis. Ninth edition, 1989

  14. [14]

    Optimal whitening and decorrelation

    Agnan Kessy, Alex Lewin, and Korbinian Strimmer. Optimal whitening and decorrelation. The American Statistician, 72(4):309–314, 2018

  15. [15]

    independent components

    Anthony J Bell and Terrence J Sejnowski. The “independent components” of natural scenes are edge filters. Vision research, 37(23):3327–3338, 1997

  16. [16]

    Ioffe and C

    S. Ioffe and C. Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In ICML, 2015

  17. [17]

    Pca based image denoising

    Y Murali Mohan Babu, M Venkata Subramanyam, and MN Giri Prasad. Pca based image denoising. Signal & Image Processing, 3(2):236, 2012

  18. [18]

    Dynamic label graph matching for unsupervised video re-identification

    Mang Ye, Andy J Ma, Liang Zheng, Jiawei Li, and Pong C Yuen. Dynamic label graph matching for unsupervised video re-identification. In ICCV, 2017

  19. [19]

    K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. In CVPR, pages 770–778, 2016

  20. [20]

    Krizhevsky

    A. Krizhevsky. Learning Multiple Layers of Features from Tiny Images. Master’s thesis, Department of Computer Science, University of Toronto, 2009

  21. [21]

    Automatic differentiation in PyTorch

    Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in PyTorch. In NIPS Autodiff Workshop, 2017. 9 Appendix 6 Approximate ED Gradients with PI in Backpropogation In the following two subsections, we prove that the gradients computed from ...