pith. sign in

arxiv: 2606.02490 · v1 · pith:4ECQC54Mnew · submitted 2026-06-01 · 💻 cs.LG

Expressivity of congruence-based architectures for DNNs on positive-definite matrices

Pith reviewed 2026-06-28 15:25 UTC · model grok-4.3

classification 💻 cs.LG
keywords congruence layerspositive definite matricesneural network expressivitysemi-orthogonalityPoincaré separation theoremSPD classificationRiemannian classifiers
0
0 comments X

The pith

The semi-orthogonality constraint on weights in congruence-like layers for positive-definite matrices limits expressivity, collapsing stacked layers to one-hidden-layer equivalents for some activations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper investigates neural network architectures that use congruence-like layers to process symmetric positive-definite matrices for classification tasks. It establishes that requiring the weight matrix W to be semi-orthogonal restricts what these layers can do. Specifically, for particular activation functions, multiple such layers end up providing no more power than a single hidden layer. The reason is a reduction in the variety of eigenvalues produced, which follows directly from Poincaré's separation theorem. The authors also compare several Riemannian-based classifiers for the output stage and note how they align with the features coming from these layers.

Core claim

Congruence-like layers multiply the input positive-definite matrix on both sides by a weight matrix W and its transpose. When W is constrained to be semi-orthogonal, Poincaré's separation theorem implies that the eigenvalues of the output cannot span a wider range than those of the input in certain ways. Combined with specific activations, this causes any number of stacked layers to produce outputs equivalent in expressivity to those of a single layer.

What carries the argument

The congruence-like layer, which transforms a positive-definite matrix X into W X W^T, with the semi-orthogonality constraint on W that triggers the spectral collapse via Poincaré's separation theorem.

If this is right

  • The architecture with multiple congruence layers behaves identically to a one-hidden-layer network for the affected activations.
  • Stacked layers fail to gain additional expressivity from depth due to repeated loss of spectral diversity.
  • Only activations that preserve the necessary spectral properties avoid the collapse.
  • The final classifier choice must account for the limited feature variety produced by the layers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Removing or relaxing the semi-orthogonality constraint could allow deeper networks to achieve greater expressivity on positive-definite data.
  • Similar spectral limitations might appear in other manifold-based neural architectures that impose orthogonality.
  • Empirical tests on classification accuracy could reveal whether the theoretical collapse translates to performance plateaus in practice.

Load-bearing premise

Poincaré's separation theorem applies directly to imply loss of spectral diversity in the stacked congruence-like layers under the chosen activation functions, without further restrictions on the input matrices.

What would settle it

A concrete counterexample would be a multi-layer congruence network with semi-orthogonal weights and the specified activations that produces output distributions or decision boundaries distinct from and more powerful than a single-layer version on the same positive-definite matrix inputs.

Figures

Figures reproduced from arXiv: 2606.02490 by Antonin Oswald, Estelle Massart.

Figure 1
Figure 1. Figure 1: Illustration of the architecture (3) of depth [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
read the original abstract

This work studies neural architectures for classifying symmetric positive-definite matrices, focusing on congruence-like layers, in which the input matrix is multiplied on the left and right by a (possibly rectangular) weight matrix $W$ and its transpose. Such layers lie at the core of the celebrated SPDNet and have also been employed independently for dimensionality reduction on positive-definite data. We show that the (semi)-orthogonality constraint commonly imposed on $W$ limits the expressivity of these layers: for certain activation functions, the resulting architecture collapses to a one-hidden-layer equivalent. This lack of expressivity follows from a loss of spectral diversity in congruence-like layers for semi-orthogonal $W$ and is a direct consequence of Poincar\'e's separation theorem. We then examine the choice of the final classifier, comparing several Riemannian classifiers and discussing their compatibility with the feature maps produced by congruence-like layers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies congruence-like layers (W A W^T) for DNNs on symmetric positive-definite matrices, as used in SPDNet. It claims that the common semi-orthogonality constraint on W causes loss of spectral diversity by Poincaré's separation theorem, so that for certain activation functions any number of such layers is equivalent to a single hidden layer. The work also compares several Riemannian classifiers for the final layer and their compatibility with the resulting feature maps.

Significance. If the central claim is established with precise conditions, the result would be significant for manifold-valued deep learning: it identifies a structural limitation in a widely adopted architecture and supplies a theorem-based explanation rather than an empirical observation. The reliance on an external result (Poincaré separation) rather than fitted parameters is a methodological strength.

major comments (2)
  1. [Abstract / main expressivity theorem] The argument that stacked congruence layers remain equivalent to a single layer requires an explicit hypothesis on the activation functions (e.g., eigenvalue-wise monotonicity or a similar property) that prevents recovery of the eigenvalues discarded by Poincaré separation. The abstract states only “certain activation functions” without listing the class or verifying that the composition across layers cannot restore the lost spectral information; this step is load-bearing for the multi-layer collapse claim.
  2. [Expressivity analysis (section containing the proof of collapse)] Poincaré separation (or Cauchy interlacing) bounds the eigenvalues of W^T A W when W has orthonormal columns, but the manuscript must still demonstrate that the subsequent nonlinear map, when iterated, cannot propagate information from the discarded eigenvalues. Without this propagation argument, the reduction to a one-hidden-layer equivalent does not automatically follow from the single-layer spectral loss.
minor comments (2)
  1. [Introduction / Methods] Notation for the congruence operation and the precise definition of “semi-orthogonal” (rectangular vs. square) should be stated once at the beginning of the methods section for clarity.
  2. [Classifier comparison section] The comparison of Riemannian classifiers would benefit from a short table summarizing which classifiers are compatible with the spectral features produced by the congruence layers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on the expressivity analysis. We address the two major comments point by point below and will revise the manuscript to improve clarity on the activation hypotheses and the iterative non-recovery argument.

read point-by-point responses
  1. Referee: [Abstract / main expressivity theorem] The argument that stacked congruence layers remain equivalent to a single layer requires an explicit hypothesis on the activation functions (e.g., eigenvalue-wise monotonicity or a similar property) that prevents recovery of the eigenvalues discarded by Poincaré separation. The abstract states only “certain activation functions” without listing the class or verifying that the composition across layers cannot restore the lost spectral information; this step is load-bearing for the multi-layer collapse claim.

    Authors: We agree that the abstract is too terse. The main text defines the relevant class as eigenvalue-wise monotonic (strictly increasing) functions; the proof shows that monotonicity together with repeated application of Poincaré separation prevents recovery of discarded eigenvalues. We will revise the abstract to read 'for eigenvalue-wise monotonic activation functions' and add one sentence noting that the composition across layers cannot restore lost spectral information. revision: yes

  2. Referee: [Expressivity analysis (section containing the proof of collapse)] Poincaré separation (or Cauchy interlacing) bounds the eigenvalues of W^T A W when W has orthonormal columns, but the manuscript must still demonstrate that the subsequent nonlinear map, when iterated, cannot propagate information from the discarded eigenvalues. Without this propagation argument, the reduction to a one-hidden-layer equivalent does not automatically follow from the single-layer spectral loss.

    Authors: The existing inductive argument already shows that each layer re-applies Poincaré separation to the output of the preceding activation, and monotonicity of the activation preserves the interlacing bounds without restoring eigenvalues outside them. To make the non-propagation step fully explicit, we will insert a short lemma stating that the composition of congruence-plus-monotonic-activation cannot recover information lost at any prior layer. revision: yes

Circularity Check

0 steps flagged

No circularity; central claim rests on external Poincaré separation theorem

full rationale

The paper's key expressivity result is explicitly attributed to an external classical theorem (Poincaré's separation theorem) rather than to any fitted parameter, self-definition, or self-citation chain. The abstract states the collapse 'is a direct consequence of Poincaré's separation theorem' with no indication that the theorem itself is derived from the present work or that the activation composition step reduces to a tautology. No load-bearing step is shown to equate a prediction with its own input by construction, and the derivation chain therefore remains independent of the paper's own fitted quantities or prior self-referential results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the application of Poincaré's separation theorem to the layers, which is a standard result from linear algebra.

axioms (1)
  • standard math Poincaré's separation theorem
    Invoked to explain loss of spectral diversity in congruence-like layers with semi-orthogonal W.

pith-pipeline@v0.9.1-grok · 5679 in / 1317 out tokens · 35265 ms · 2026-06-28T15:25:05.996771+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 1 linked inside Pith

  1. [1]

    Arora, N

    S. Arora, N. Cohen, and E. Hazan. On the optimization of deep networks: Implicit acceleration by overparameterization. InProceedings of the 35th International Conference on Machine Learning, 2018

  2. [2]

    Arora, N

    S. Arora, N. Cohen, W. Hu, and Y . Luo. Implicit Regularization in Deep Matrix Factorization. InProceedings of the 33rd Conf. on Neural Information Processing Systems, 2019

  3. [3]

    Arsigny, P

    V . Arsigny, P. Fillard, X. Pennec, and N. Ayache. Geometric Means in a Novel Vector Space Structure on Symmetric Positive-Definite Matrices. SIAM J. on Matrix Analysis and Applications, 29(1):328–347, 2007

  4. [4]

    Barachant, S

    A. Barachant, S. Bonnet, M. Congedo, and C. Jutten. Multiclass brain- computer interface classification by Riemannian geometry.IEEE Tr. on Biomedical Engineering, 59(4):920–928, 2012

  5. [5]

    Bellman.Introduction to Matrix Analysis, Second Edition

    R. Bellman.Introduction to Matrix Analysis, Second Edition. SIAM, 1997

  6. [6]

    Bhatia.Matrix Analysis, volume 169 ofGraduate Texts in Mathe- matics

    R. Bhatia.Matrix Analysis, volume 169 ofGraduate Texts in Mathe- matics. Springer, 1997

  7. [7]

    Boucherie, T

    C. Boucherie, T. de Surrel, and F. Yger. SPDNet-AE: a Compact SPD Representation through Riemannian Autoencoding. In34th European Symposium on Artificial Neural Networks, 2026

  8. [8]

    Boumal.An introduction to optimization on smooth manifolds

    N. Boumal.An introduction to optimization on smooth manifolds. Cambridge University Press, 2023

  9. [9]

    M. M. Bronstein, J. Bruna, Y . LeCun, A. Szlam, and P. Vandergheynst. Geometric Deep Learning: Going beyond Euclidean data.IEEE Signal Processing Magazine, 34(3):18–42, 2017

  10. [10]

    Brooks, O

    D. Brooks, O. Schwander, F. Barbaresco, J.-Y . Schneider, and M. Cord. Riemannian batch normalization for SPD neural networks. InProceed- ings of the 33rd International Conf. on Neural Information Processing Systems, 2019

  11. [11]

    Cabanes, F

    Y . Cabanes, F. Barbaresco, M. Arnaudon, and J. Bigot. Toeplitz Hermitian positive definite matrix machine learning based in Fisher metric. InProceedings of Geometric Science of Information, 2019

  12. [12]

    Chakraborty, J

    R. Chakraborty, J. Bouza, J. H. Manton, and B. C. Vemuri. ManifoldNet: A Deep Neural Network for Manifold-Valued Data with Applications. IEEE Tr. on Pattern Analysis and Machine Intelligence, 44(2):799–810, 2022

  13. [13]

    Harandi and M

    M. Harandi and M. Salzmann. Riemannian coding and dictionary learning: Kernels to the rescue. InProceedings of the IEEE Conf. on Computer Vision and Pattern Recognition, pages 3926–3935, 2015

  14. [14]

    Harandi, M

    M. Harandi, M. Salzmann, and R. Hartley. Dimensionality Reduction on SPD Manifolds: The Emergence of Geometry-Aware Methods.IEEE Tr. on Pattern Analysis and Machine Intelligence, 40(1):48–62, 2018

  15. [15]

    N. J. Higham.Functions of Matrices. SIAM, 2008

  16. [16]

    Horev, F

    I. Horev, F. Yger, and M. Sugiyama. Geometry-aware principal compo- nent analysis for symmetric positive definite matrices. InProceedings of the Asian Conf. on Machine Learning, pages 1–16, 2022

  17. [17]

    Huang and L

    Z. Huang and L. Van Gool. A Riemannian network for SPD matrix learning. InProceedings of the 31st AAAI Conf. on Artificial Intelligence, page 2036–2042, 2017

  18. [18]

    Ionescu, J

    C. Ionescu, J. Carreira, and C. Sminchisescu. Iterated second-order label sensitive pooling for 3d human pose estimation. InProceedings of the IEEE Conf. on Computer Vision and Pattern Recognition, pages 1661– 1668, 2014

  19. [19]

    Jayasumana, R

    S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and M. Harandi. Kernel Methods on the Riemannian Manifold of Symmetric Positive Definite Matrices. InProceedings of the IEEE Conf. on Computer Vision and Pattern Recognition, page 73–80, 2013

  20. [20]

    L ´opez, B

    F. L ´opez, B. Pozzetti, S. Trettel, M. Strube, and A. Wienhard. Vector- valued distance and gyrocalculus on the space of symmetric positive definite matrices. InProceedings of the 35th International Conf. on Neural Information Processing Systems, 2021

  21. [21]

    Massart and P.-A

    E. Massart and P.-A. Absil. Quotient geometry with simple geodesics for the manifold of fixed-rank positive-semidefinite matrices.SIAM J. on Matrix Analysis and Applications, 41(1):171–198, 2020

  22. [22]

    Massart and S

    E. Massart and S. Chevallier. Inductive means and sequences applied to online classification of EEG. InProceedings of Geometric Science of Information, 2017

  23. [23]

    X. S. Nguyen and S. Yang. Building Neural Networks on Matrix Manifolds: A Gyrovector Space Approach. InProceedings of the 40th International Conf. on Machine Learning, 2023

  24. [24]

    Yang, and A

    X.S Nguyen, S. Yang, and A. Histace. Matrix manifold neural net- works++. InProceedings of the 12th International Conf. on Learning Representations, 2024

  25. [25]

    Nwankpa, W

    C. Nwankpa, W. Ijomah, A. Gachagan, and S. Marshall. Activation functions: Comparison of trends in practice and research for deep learning.arXiv preprint arXiv:1811.03378, 2018

  26. [26]

    Pennec, P

    X. Pennec, P. Fillard, and N. Ayache. A Riemannian Framework for Tensor Computing.International J. of Computer Vision, 66(1):41–66, 2006

  27. [27]

    S. Said, L. Bombrun, Y . Berthoumieu, and J. H. Manton. Riemannian Gaussian distributions on the Space of Symmetric Positive Definite Matrices.IEEE Tr. on Information Theory, 63:2153–2170, 2017

  28. [28]

    S. Sra. A new metric on the manifold of kernel matrices with application to matrix geometric means. InProceedings of the 26th Conf. on Neural Information Processing Systems, 2012

  29. [29]

    Steinert, S

    F. Steinert, S. Said, and C. Mostajeran. Universal Kernels via Harmonic Analysis on Riemannian Symmetric Spaces. InProceedings of Geomet- ric Science of Information, 2025

  30. [30]

    Tosato, M

    D. Tosato, M. Farenzena, M. Spera, V . Murino, and M. Cristani. Multi- class Classification on Riemannian Manifolds for Video Surveillance. InProceedings of the 11th European Conf. on Computer Vision, pages 378–391, 2010

  31. [31]

    T ¨uzel, F

    O. T ¨uzel, F. Porikli, and P. Meer. Pedestrian Detection via Classification on Riemannian Manifolds.IEEE Tr. on Pattern Analysis and Machine Intelligence, 30(10):1713–1727, 2008

  32. [32]

    R. Wang, H. Guo, L. S. Davis, and Q. Dai. Covariance discriminative learning: A natural and efficient approach to image set classification. InProceedings of the IEEE Conf. on Computer Vision and Pattern Recognition, pages 2496–2503, 2012

  33. [33]

    Wang, X.-J

    R. Wang, X.-J. Wu, Z. Chen, T. Xu, and J. Kittler. DreamNet: A Deep Riemannian Manifold Network for SPD Matrix Learning. In Proceedings of the Asian Conf. on Computer Vision, pages 3241–3257, 2022

  34. [34]

    Wang, X.-J

    R. Wang, X.-J. Wu, T. Xu, C. Hu, and J. Kittler. U-SPDNet: An SPD manifold learning-based neural network for visual classification.Neural Networks, 161:382–396, 2023

  35. [35]

    Wilson, R

    D. Wilson, R. T. Schirrmeister, L. A. W. Gemein, and T. Ball. Deep Riemannian Networks for end-to-end EEG decoding.Imaging Neuro- science, 3, 2025