arxiv: 2604.20308 · v1 · submitted 2026-04-22 · 💻 cs.LG

Recognition: unknown

Sheaf Neural Networks on SPD Manifolds: Second-Order Geometric Representation Learning

Yuhan Peng , Junwen Dong , Yuzhi Zeng , Hao Li , Ce Ju , Huitao Feng , Diaaeldin Taha , Anna Wienhard

show 1 more author

Kelin Xia

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:35 UTC · model grok-4.3

classification 💻 cs.LG

keywords sheaf neural networksSPD manifoldsgeometric deep learninggraph representation learningmolecular property predictionLie groupssecond-order representations

0 comments

The pith

Sheaf neural networks defined on SPD manifolds represent second-order geometric features that Euclidean sheaves cannot.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops the first sheaf neural network that runs natively on the symmetric positive definite matrix manifold instead of projecting features into vectors. It proves that SPD-valued sheaves are strictly more expressive than vector-valued ones because they support consistent global sections of matrix assignments that no Euclidean sheaf can realize. This matters for graph tasks that need to capture how directions covary, such as atomic orientations in molecules. The construction uses the Lie group structure of SPD matrices to define edge-specific transformations directly on the manifold. The resulting dual-stream model reaches state-of-the-art accuracy on most MoleculeNet benchmarks while remaining stable at greater depths.

Core claim

SPD-valued sheaves are strictly more expressive than Euclidean sheaves: they admit consistent configurations (global sections) that vector-valued sheaves cannot represent, directly translating to richer learned representations.

What carries the argument

The Lie group structure on the SPD manifold that permits well-posed analogs of sheaf restriction and extension operators without any projection to Euclidean space.

If this is right

Sheaf convolution converts rank-1 directional inputs into full-rank matrices that encode local geometric structure.
The dual-stream architecture reaches state-of-the-art results on six of the seven MoleculeNet benchmarks.
The sheaf framework maintains performance as network depth increases.
Matrix-valued features propagate across edges using manifold-native transformations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same Lie-group construction could be tried on other matrix manifolds to capture higher-order relations in geometric data.
Tasks that already use covariance features, such as certain vision or sensor problems, might gain expressivity by adopting manifold-valued sheaves.
Systematic checks on non-molecular graphs with directional data would test whether the reported expressivity advantage holds outside the current benchmarks.

Load-bearing premise

The SPD manifold admits a Lie group structure that allows sheaf operators to be defined directly on it.

What would settle it

A small graph together with an assignment of SPD matrices to its edges such that a consistent global section exists for the SPD sheaf but no vector-valued sheaf on the same graph admits any consistent assignment.

Figures

Figures reproduced from arXiv: 2604.20308 by Anna Wienhard, Ce Ju, Diaaeldin Taha, Hao Li, Huitao Feng, Junwen Dong, Kelin Xia, Yuhan Peng, Yuzhi Zeng.

**Figure 1.** Figure 1: SPD Sheaf Neural Networks. Top: SPD sheaves assign matrix-valued stalks (SPDn) rather than vector stalks (R n ). Through sheaf convolution, SPD representations gain second-order structure (effective rank increases), while Euclidean representations remain first-order. Bottom: Dual-stream architecture: the geometric stream processes coordinates via SPD sheaf convolution; the semantic stream processes node fe… view at source ↗

**Figure 2.** Figure 2: Relationship to Euclidean Sheaves. Left: Laplacianbased updates on a Euclidean sheaf F. Right: updates on the corresponding SPD-valued sheaf G. The embedding Φ lifts Euclidean features to SPD matrices, satisfying Φ(ker δF ) ⊆ ker δG. This inclusion is preserved across layers, with final representations corresponding to global sections. The strict inclusion highlights the greater representational capacity… view at source ↗

**Figure 3.** Figure 3 [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Graph neural networks face two fundamental challenges rooted in the linear structure of Euclidean vector spaces: (1) Current architectures represent geometry through vectors (directions, gradients), yet many tasks require matrix-valued representations that capture relationships between directions-such as how atomic orientations covary in a molecule. These second-order representations are naturally captured by points on the symmetric positive definite matrices (SPD) manifold; (2) Standard message passing applies shared transformations across edges. Sheaf neural networks address this via edge-specific transformations, but existing formulations remain confined to vector spaces and therefore cannot propagate matrix-valued features. We address both challenges by developing the first sheaf neural network operates natively on the SPD manifold. Our key insight is that the SPD manifold admits a Lie group structure, enabling well-posed analogs of sheaf operators without projecting to Euclidean space. Theoretically, we prove that SPD-valued sheaves are strictly more expressive than Euclidean sheaves: they admit consistent configurations (global sections) that vector-valued sheaves cannot represent, directly translating to richer learned representations. Empirically, our sheaf convolution transforms effectively rank-1 directional inputs into full-rank matrices encoding local geometric structure. Our dual-stream architecture achieves SOTA on 6/7 MoleculeNet benchmarks, with the sheaf framework providing consistent depth robustness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds the first sheaf neural networks that run natively on the SPD manifold via its Lie group structure, proves they are strictly more expressive than Euclidean sheaves, and reports SOTA results on most MoleculeNet tasks.

read the letter

The main point is that this work defines sheaf operators directly on SPD matrices by treating the manifold as a Lie group under multiplication. That move lets the model keep matrix-valued features throughout message passing instead of flattening them to vectors, and the authors prove the resulting sheaves can realize consistent global sections that vector sheaves cannot reach. The construction and the expressivity claim are the genuinely new pieces; prior sheaf networks stayed Euclidean and prior SPD-GNNs did not add per-edge sheaf transformations. The paper does a solid job showing how the convolution turns rank-1 directional inputs into full-rank matrices that encode local covariances, which fits tasks like molecular geometry. Their dual-stream architecture then delivers SOTA numbers on six of the seven MoleculeNet benchmarks and improves depth stability, which suggests the framework is doing real work rather than just swapping the base space. The soft spots sit in the verification details. The expressivity proof is sketched at a high level and would benefit from seeing the full steps that link the Lie group action to the existence of new sections. On the experiments, the SOTA claim is useful but would be stronger with explicit ablations that isolate the sheaf component from the SPD representation and from the dual-stream design. The assumption that the manifold operations remain well-posed looks reasonable from the abstract, yet checking numerical stability on the group multiplication for edge cases would be prudent. This paper is aimed at researchers who already work on geometric deep learning for chemistry or relational data and who need richer second-order features. A reader looking for concrete ways to extend GNNs beyond vectors will find usable constructions and a testable theoretical result. It deserves a serious referee because the novelty is clear, the claims are falsifiable, and the empirical evidence is concrete enough to review. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper introduces the first sheaf neural network operating natively on the SPD manifold by exploiting its Lie group structure under matrix multiplication to define edge-specific sheaf operators without Euclidean projection. It proves that SPD-valued sheaves are strictly more expressive than Euclidean vector sheaves because they admit global sections (consistent configurations) impossible in vector spaces. A dual-stream architecture is proposed that maps rank-1 directional inputs to full-rank matrix features, achieving SOTA results on 6/7 MoleculeNet benchmarks with improved depth robustness.

Significance. If the expressivity proof holds, the work meaningfully extends geometric deep learning by enabling second-order matrix-valued representations in sheaf GNNs, with potential impact on tasks like molecular modeling where covariances matter. The empirical SOTA and depth robustness are promising strengths, and the attempt at a strict expressivity result (rather than just empirical gains) is a positive feature worth crediting. Significance would be higher with explicit comparisons to prior SPD manifold networks.

major comments (2)

[Theoretical section] Theoretical section (expressivity theorem): The central claim that SPD sheaves admit global sections impossible for vector sheaves is load-bearing but requires an explicit construction of such a section (e.g., via the Lie group multiplication) together with a proof that no equivalent exists under standard vector sheaf restriction maps. Without this concrete counterexample and operator definitions, the strict expressivity advantage cannot be verified.
[Experiments section] Experiments section: The SOTA claim on 6/7 MoleculeNet benchmarks is central to the empirical contribution; the manuscript must report the full set of baselines (including recent SPD and manifold GNNs), statistical significance tests, and ablation on the sheaf component versus the dual-stream design to confirm the gains are attributable to the proposed framework.

minor comments (2)

The abstract refers to 'our sheaf convolution transforms effectively rank-1 directional inputs into full-rank matrices'; add a brief methods paragraph or figure clarifying the input preprocessing and the precise form of the convolution operator on the manifold.
Notation for the SPD Lie group operations (e.g., how the group multiplication induces the sheaf restriction maps) should be introduced with a short table or diagram for readers outside the subfield.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments on our manuscript. We address each of the major comments point by point below and outline the revisions we will make to strengthen the paper.

read point-by-point responses

Referee: [Theoretical section] Theoretical section (expressivity theorem): The central claim that SPD sheaves admit global sections impossible for vector sheaves is load-bearing but requires an explicit construction of such a section (e.g., via the Lie group multiplication) together with a proof that no equivalent exists under standard vector sheaf restriction maps. Without this concrete counterexample and operator definitions, the strict expressivity advantage cannot be verified.

Authors: We appreciate the referee's emphasis on making the expressivity theorem fully verifiable. Upon review, we acknowledge that the current manuscript could benefit from a more explicit construction. In the revised version, we will provide a concrete example of a global section using the Lie group multiplication on SPD matrices. We will define the edge-specific sheaf operators explicitly via the group operation and demonstrate a configuration that is consistent under these operators but cannot be achieved with vector-valued sheaves under standard restriction maps. A formal proof will be included showing the impossibility in the Euclidean case due to the preservation of positive definiteness and non-commutativity. This will directly address the concern and solidify the strict expressivity advantage. revision: yes
Referee: [Experiments section] Experiments section: The SOTA claim on 6/7 MoleculeNet benchmarks is central to the empirical contribution; the manuscript must report the full set of baselines (including recent SPD and manifold GNNs), statistical significance tests, and ablation on the sheaf component versus the dual-stream design to confirm the gains are attributable to the proposed framework.

Authors: We agree that additional details in the experimental section are necessary to fully support the claims. In the revised manuscript, we will include comparisons against a comprehensive set of baselines, specifically incorporating recent SPD and manifold-based GNNs. We will also report statistical significance tests (such as t-tests across multiple runs) for the performance differences. Furthermore, we will add ablation studies that isolate the contribution of the sheaf operators from the dual-stream architecture to demonstrate that the improvements are indeed due to the proposed SPD sheaf neural network framework. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's key theoretical result—that SPD-valued sheaves admit global sections impossible for vector-valued sheaves—is derived from the standard Lie group structure on the SPD manifold under matrix multiplication, a pre-existing fact independent of the paper. No equations reduce a claimed prediction or first-principles result to fitted parameters or self-referential definitions. The expressivity proof is framed as following directly from manifold properties without self-citation load-bearing steps or ansatz smuggling. Empirical SOTA claims are separate from the theoretical chain and do not create circularity. The derivation remains self-contained against external mathematical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no specific free parameters, axioms, or invented entities can be identified from the text; the Lie group structure on SPD is treated as a standard mathematical fact.

pith-pipeline@v0.9.0 · 5550 in / 1118 out tokens · 32566 ms · 2026-05-10T00:35:29.506272+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Consistent Geometric Deep Learning via Hilbert Bundles and Cellular Sheaves
cs.LG 2026-05 unverdicted novelty 6.0

HilbNets discretize Hilbert bundle convolutions through Hilbert Cellular Sheaves whose Laplacians converge to the continuous connection Laplacian, enabling consistent learning across samplings.

Reference graph

Works this paper leans on

20 extracted references · 4 canonical work pages · cited by 1 Pith paper

[1]

Fang, X., Liu, L., Lei, J., He, D., Zhang, S., Zhou, J., Wang, F., Wu, H., and Wang, H

URL https://jmlr.org/papers/v24/ 22-0567.html. Fang, X., Liu, L., Lei, J., He, D., Zhang, S., Zhou, J., Wang, F., Wu, H., and Wang, H. Geometry-enhanced molecular representation learning for property prediction. Nat. Mach. Intell., 4(2):127–134, 2022. doi: 10.1038/ S42256-021-00438-4. URL https://doi.org/10. 1038/s42256-021-00438-4. Fuchs, F., Worrall, D....

2022
[2]

Inductive Representation Learning on Large Graphs

URL https://openreview.net/forum? id=B1eWbxStPH. Gasteiger, J., Becker, F., and G¨unnemann, S. Gemnet: Uni- versal directional graph neural networks for molecules. InConference on Neural Information Processing Systems (NeurIPS), 2021. Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. Neural message passing for quantum chem- istry. ...

work page Pith review arXiv 2021
[3]

Huang, Z

URL https://openreview.net/forum? id=HJlWWJSFDH. Huang, Z. and Gool, L. V . A riemannian network for SPD matrix learning. In Singh, S. and Markovitch, S. (eds.),Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Fran- cisco, California, USA, pp. 2036–2042. AAAI Press,

2017
[4]

URL https: //doi.org/10.1609/aaai.v31i1.10866

doi: 10.1609/AAAI.V31I1.10866. URL https: //doi.org/10.1609/aaai.v31i1.10866. Irwin, J. J. and Shoichet, B. K. Zinc–a free database of commercially available compounds for virtual screening. Journal of Chemical Information and Modeling, 45(1): 177–182, 2005. doi: 10.1021/ci049714+. Jordan, K., Jin, Y ., Boza, V ., Jiacheng, Y ., Cesista, F., New- house, L...

work page doi:10.1609/aaai.v31i1.10866 2005
[5]

doi: 10.1109/TNNLS.2023.3307470. Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. In5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Pro- ceedings, 2017. URL https://openreview.net/ forum?id=SJU4ayYgl. Li, P., Xie, J., Wang, Q., and Zuo, W....

work page doi:10.1109/tnnls.2023.3307470 2023
[6]

Pfurtscheller, G

URL https://openreview.net/forum? id=S1ldO2EFPr. Pfurtscheller, G. and Neuper, C. Motor imagery and direct brain-computer communication.Proceedings of the IEEE, 89(7):1123–1134, 2001. Rong, Y ., Bian, Y ., Xu, T., Xie, W., Wei, Y ., Huang, W., and Huang, J. Self-supervised graph transformer on large-scale molecular data. In Larochelle, H., Ranzato, M., Ha...

work page doi:10.1016/j.crmeth.2023.100621 2001
[7]

addition

URL https://openreview.net/forum? id=6K2RM6wVqKu. 12 Sheaf Neural Networks on SPD Manifolds Appendix A. Proofs A.1. Geodesic completeness of Lemma 3.2 A Riemannian manifold (M, g) is said to be(geodesically) completeif every maximal geodesic γ:I→ M is defined on the entire real line, i.e., I= (−∞,∞) . Equivalently, M is (geodesically) complete if for all ...

2018
[8]

Pre-training approaches: N-GramRF/XGB (Liu et al., 2019), PretrainGNN (Hu et al., 2020), GraphMVP (Liu et al., 2022a), MolCLR (Wang et al., 2022b), and Uni-Mol (Zhou et al., 2023) 20 Sheaf Neural Networks on SPD Manifolds

2019
[9]

Geometry-aware models: GEM (Fang et al., 2022), GROVE (Rong et al., 2020), Mol-GDL (Shen et al., 2023), SchNet (Sch¨utt et al., 2017), EGNN (Satorras et al., 2021) 4.Transformer-based methods: SMPT (Li et al., 2024) C.3. Evaluation Protocol Following standard practice, we use scaffold splitting with an 80%/10%/10% train/validation/test partition to ensure...

2022
[10]

SinceX (ℓ) v is invariant andQ (ℓ) does not depend on 3D coordinates, ˜X(ℓ) v remains invariant

Learnable isometry: ˜X(ℓ) v =Q (ℓ)X(ℓ) v Q(ℓ)⊤, where Q(ℓ) ∈O(3) is a learned parameter independent of the input coordinate system. SinceX (ℓ) v is invariant andQ (ℓ) does not depend on 3D coordinates, ˜X(ℓ) v remains invariant
[11]

Since semantic features encode discrete chemical information (atom types, bond types) that is inherently rotation invariant, the restriction maps M(ℓ) ve are invariant

Restriction maps: Fv→e( ˜X(ℓ) v ) =M (ℓ) ve ˜X(ℓ) v M(ℓ)⊤ ve , where M(ℓ) ve ∈O(3) is predicted from semantic node features h(ℓ) v , h(ℓ) u via the sheaf learner (Equation 23). Since semantic features encode discrete chemical information (atom types, bond types) that is inherently rotation invariant, the restriction maps M(ℓ) ve are invariant. Combined wi...
[12]

Coboundary and adjoint:These operations (Equations 5, 7) involve matrix logarithms, additions in Sym3, and exponentials—all coordinate-free algebraic operations that preserve invariance
[13]

By induction, all layers produce rotation-invariant representations

Lie group update: X(ℓ+1) v =X (ℓ) v ⊙(L (ℓ) F ˜X(ℓ))v (Equation 15) combines invariant quantities via the Lie group operation, yielding an invariant output. By induction, all layers produce rotation-invariant representations. Since ¯X(0) v is rotation invariant after the first-layer canonicalization, all subsequent SPD representations X(ℓ) v and the final...

2018
[14]

We train on 9 folds and evaluate on the remaining fold, repeating the procedure 10 times such that each fold serves as the test set once

10-fold Cross-Validation (CV).For each subject, trials are partitioned into 10 equal-sized, class-balanced folds. We train on 9 folds and evaluate on the remaining fold, repeating the procedure 10 times such that each fold serves as the test set once
[15]

As sessions A and B are typically recorded on different days, this protocol captures session-to-session distribution shifts

Cross-Session Holdout.To assess robustness to inter-session (between-days) variability, we adopt a session-wise split: models are trained on the first session (A) and evaluated on the entire second session (B). As sessions A and B are typically recorded on different days, this protocol captures session-to-session distribution shifts. Note that, under both...

2008
[16]

(ii) matrix multiplications onNSPD matrices:O(N d 3)

SPD Transform:The transformation ˜Xv =Q vXvQ⊤ v involves: (i) learning orthogonal matrix Qv: O(N d3). (ii) matrix multiplications onNSPD matrices:O(N d 3). Total:O(N d 3)
[17]

Total:O(EF 2 +Ed 3)

Sheaf Learner:Learning orthogonal restriction maps from node features: (i) feature concatenation for each edge: O(EF) ; (ii) MLP forward pass producing skew-symmetric parameters: O(EF 2) (width scales with F ); (iii) Cayley transform(I−S/2) −1(I+S/2)for2Ematrices:O(Ed 3). Total:O(EF 2 +Ed 3)
[18]

Total:O(Ed 3 +N d 3)

Sheaf Laplacian Diffusion:The sheaf-based message passing comprises: (i) congruence actions MveXvM ⊤ ve for source and destination: O(Ed3); (ii) logarithmic maps on edge SPD matrices: O(Ed3); (iii) pullback transforms M ⊤(·)M: O(Ed3); (iv) scatter-add aggregation: O(Ed2); (v) eigenvalue normalization: O(N d3); (vi) exponential map:O(N d 3). Total:O(Ed 3 +...
[19]

Total:O(N d 3 +N d 2F+N F 2)

Cross-Manifold Interaction:The SPD-to-feature interaction: (i) log map to tangent space: O(N d3); (ii) linear projections and attention computation:O(N d 2F+N F 2). Total:O(N d 3 +N d 2F+N F 2). 6.SPD Nonlinearity:Eigendecomposition and reconstruction:O(N d 3). Overall Complexity.Combining all six components, the per-layer complexity is: O N d3 +Ed 3 +EF+...
[20]

The dominant terms are thus O(EF 2 +N F 2) versus O(EF+N F 2) for standard GNNs

Sheaf learner: O(EF 2) for learning edge-specific restriction maps, compared to O(EF) for standard edge-wise operations. The dominant terms are thus O(EF 2 +N F 2) versus O(EF+N F 2) for standard GNNs. In practice, for typical molecular graphs with N≈20 –50 atoms and F= 128 , the N F2 term dominates both models, making the additional EF 2 overhead modest....