Adaptive Canonicalization with Application to Invariant Anisotropic Geometric Networks

Ron Levie; Ya-Wei Eileen Lin

arxiv: 2509.24886 · v3 · submitted 2025-09-29 · 💻 cs.LG

Adaptive Canonicalization with Application to Invariant Anisotropic Geometric Networks

Ya-Wei Eileen Lin , Ron Levie This is my paper

Pith reviewed 2026-05-18 12:41 UTC · model grok-4.3

classification 💻 cs.LG

keywords adaptive canonicalizationequivariant machine learningsymmetry-respecting networksuniversal approximationspectral graph neural networkspoint cloud classificationinvariant geometric networkscanonicalization in ML

0 comments

The pith

Selecting the canonical form that maximizes network predictive confidence produces continuous and symmetry-respecting models with universal approximation properties.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces adaptive canonicalization, in which the standard form of an input is chosen depending on both the input and the network itself. Specifically, the form is selected to maximize the network's predictive confidence, called prior maximization. This avoids the discontinuities that fixed canonicalization often introduces when enforcing symmetries such as rotations or graph automorphisms. The construction is proven to produce models that are continuous, exactly respect symmetries, and can universally approximate any continuous invariant function. The method is demonstrated on resolving eigenbasis ambiguities in spectral graph networks and on handling rotations for point clouds, where it outperforms data augmentation, standard canonicalization, and equivariant architectures on molecular, protein, and point cloud classification.

Core claim

Adaptive canonicalization based on prior maximization selects the canonical form of the input to maximize the predictive confidence of the network. We prove that this construction yields continuous and symmetry-respecting models that admit universal approximation properties. We propose two applications of our setting: resolving eigenbasis ambiguities in spectral graph neural networks, and handling rotational symmetries in point clouds. We empirically validate our methods on molecular and protein classification, as well as point cloud classification tasks. Our adaptive canonicalization outperforms the three other common solutions to equivariant machine learning: data augmentation, standard

What carries the argument

adaptive canonicalization based on prior maximization: the mechanism that chooses the input's canonical form to maximize the network's predictive confidence, thereby enforcing continuity and exact symmetry respect

If this is right

The resulting models are continuous functions of the input while exactly respecting the symmetries of the data.
These models can universally approximate any continuous function that is invariant under the given symmetry group.
Eigenbasis ambiguities in spectral graph neural networks are resolved without introducing discontinuities in the mapping.
Rotational symmetries in point cloud data are handled by selecting the orientation that maximizes network confidence.
The approach achieves higher classification accuracy than data augmentation, standard canonicalization, or equivariant architectures on geometric datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The adaptive selection could implicitly favor stable training trajectories by aligning canonical choices with regions of high network confidence.
The same prior-maximization idea could be tested on other symmetry groups such as reflections or discrete permutations beyond graphs.
Observing which canonical forms are chosen most often on a dataset might reveal how the network internally resolves geometric ambiguities.
The framework might improve generalization when training data is limited, because the symmetry-respecting property is enforced exactly rather than approximately.

Load-bearing premise

Maximizing the network's predictive confidence over possible canonical forms always produces a selection that is continuous in the input and exactly respects symmetries, without needing extra restrictions on the network or loss landscape.

What would settle it

A concrete input graph or point cloud together with a small perturbation where the maximizing canonical form switches abruptly, causing the overall model output to become discontinuous or to violate the input symmetry.

Figures

Figures reproduced from arXiv: 2509.24886 by Ron Levie, Ya-Wei Eileen Lin.

**Figure 1.** Figure 1: Illustration of prior maximization adaptive canonicalization in classification. The adaptive canonicalization optimizes the transformations βx,Ψj of the inputs x to the classifiers Ψj , while, during training, Ψj are simultaneously trained w.r.t. the adaptively canonicalized inputs π(βx,Ψj )x. Our Contribution. In this paper, we show that the continuity problem in canonicalization can be solved if, instead… view at source ↗

**Figure 2.** Figure 2: Hyperparameter sensitivity with respect to grid size, noise level, and hidden dimension. G.6 Truncation Canonicalization with a pretrained classifier We introduce in App. E.5 an application of our adaptive canonicalization on truncation prior maximization. We now illustrate the applicability of this setup with a pretrained image classifier. Specifically, we take a ResNet-18 [He et al., 2016] pretrained on … view at source ↗

**Figure 3.** Figure 3: Mean geodesic distance on SO(3) between the canonicalizations between consecutive epochs. G.9 Canonicalized point clouds [PITH_FULL_IMAGE:figures/full_fig_p037_3.png] view at source ↗

**Figure 4.** Figure 4: The canonicalized point clouds for the chair class. G.10 ShapeNet Part Segmentation [PITH_FULL_IMAGE:figures/full_fig_p038_4.png] view at source ↗

read the original abstract

Canonicalization is a widely used strategy in equivariant machine learning, enforcing symmetry in neural networks by mapping each input to a standard form. Yet, it often introduces discontinuities that can affect stability during training, limit generalization, and complicate universal approximation theorems. In this paper, we address this by introducing adaptive canonicalization, a general framework in which the canonicalization depends both on the input and the network. Specifically, we present the adaptive canonicalization based on prior maximization, where the standard form of the input is chosen to maximize the predictive confidence of the network. We prove that this construction yields continuous and symmetry-respecting models that admit universal approximation properties. We propose two applications of our setting: (i) resolving eigenbasis ambiguities in spectral graph neural networks, and (ii) handling rotational symmetries in point clouds. We empirically validate our methods on molecular and protein classification, as well as point cloud classification tasks. Our adaptive canonicalization outperforms the three other common solutions to equivariant machine learning: data augmentation, standard canonicalization, and equivariant architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The adaptive canonicalization via prior maximization claims to deliver continuous, symmetry-preserving models with universal approximation, but the argmax selection risks introducing discontinuities unless the paper supplies a specific continuity argument.

read the letter

The central idea here is choosing the canonical representative that maximizes the network's predictive confidence rather than using a fixed rule. This is positioned as a general fix for the discontinuity problems that plague standard canonicalization in equivariant geometric networks, and the abstract states they prove continuity, symmetry preservation, and universal approximation follow from the construction.

Referee Report

2 major / 2 minor

Summary. The paper introduces adaptive canonicalization based on prior maximization, in which the canonical representative of an input is chosen to maximize the network's predictive confidence. This construction is claimed to yield continuous, symmetry-respecting models that admit universal approximation. Two concrete applications are developed: resolving eigenbasis ambiguities in spectral graph neural networks and handling rotational symmetries for point clouds. Empirical results on molecular, protein, and point-cloud classification tasks show outperformance relative to data augmentation, standard canonicalization, and equivariant architectures.

Significance. If the continuity and universal-approximation claims are rigorously established, the framework would provide a practical route to symmetry enforcement that avoids both the discontinuities of fixed canonicalization and the architectural overhead of fully equivariant layers. The empirical gains on standard geometric benchmarks would be of immediate interest to practitioners in molecular modeling and 3D vision.

major comments (2)

[Abstract and §3] Abstract and §3 (theoretical development): the central claim is that prior-maximization canonicalization produces a continuous map 'without requiring further restrictions on the network or loss landscape.' The argmax operator over a finite orbit is discontinuous wherever two candidates have equal or crossing confidence values. The manuscript must supply the precise lemma or selection rule (unique maximizer, continuous tie-breaking, or smoothing) that guarantees continuity of the resulting canonicalization map; without it the continuity and universal-approximation statements rest on an unstated assumption.
[§4.1] §4.1 (eigenbasis application): the symmetry-respecting property is asserted after the adaptive choice, yet the proof sketch does not address whether the network's confidence function itself transforms equivariantly under the group action; a counter-example or explicit verification is needed to confirm that the selected eigenbasis is invariant under the original symmetry.

minor comments (2)

[Table 1 and §5] Table 1 and §5: report the number of random seeds, standard deviations, and statistical tests for the claimed outperformance; current numbers appear to be single-run point estimates.
[Notation] Notation: define 'predictive confidence' explicitly (e.g., max softmax probability, margin, or log-likelihood) at first use.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments highlight important points regarding the rigor of our continuity and symmetry claims. We address each major comment below with clarifications and proposed revisions.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (theoretical development): the central claim is that prior-maximization canonicalization produces a continuous map 'without requiring further restrictions on the network or loss landscape.' The argmax operator over a finite orbit is discontinuous wherever two candidates have equal or crossing confidence values. The manuscript must supply the precise lemma or selection rule (unique maximizer, continuous tie-breaking, or smoothing) that guarantees continuity of the resulting canonicalization map; without it the continuity and universal-approximation statements rest on an unstated assumption.

Authors: We agree that the argmax over a finite orbit is formally discontinuous at ties. Our §3 proof establishes continuity of the overall map by showing that the confidence function is continuous (as the network is continuous) and that discontinuities occur only on a lower-dimensional subset of the input space where two or more orbit elements achieve identical maximum confidence. To make this fully rigorous, we will add an explicit lemma in the revised §3 that introduces a deterministic, continuous tie-breaking rule: when multiple maximizers exist, select the representative whose canonical coordinates are closest (in Euclidean distance) to a fixed reference vector chosen once per orbit. This rule preserves the symmetry-respecting property and ensures the canonicalization map is continuous everywhere. We will also update the abstract to reference this lemma. Revision will be made. revision: yes
Referee: [§4.1] §4.1 (eigenbasis application): the symmetry-respecting property is asserted after the adaptive choice, yet the proof sketch does not address whether the network's confidence function itself transforms equivariantly under the group action; a counter-example or explicit verification is needed to confirm that the selected eigenbasis is invariant under the original symmetry.

Authors: We appreciate this request for explicit verification. In the eigenbasis application, the network (a spectral GNN) is applied after canonicalization, but the confidence score is computed from the network's output logits on the canonicalized graph. Because the underlying graph Laplacian commutes with the symmetry action, any group element g maps the orbit of possible eigenbases to itself. The maximizer of the confidence therefore selects a representative that is equivariant by construction: applying g to the input graph yields a correspondingly transformed maximizer, so the final selected eigenbasis (and thus the network output) remains invariant. We will add a short paragraph with this argument plus a brief counter-example check (a small cycle graph under rotation) to §4.1. This is a partial revision because the core invariance follows from the construction but requires the added verification paragraph. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation relies on independent proof of properties for the defined construction

full rationale

The paper defines adaptive canonicalization via prior maximization (selecting the input form that maximizes the network's predictive confidence) and states that it proves the resulting models are continuous, symmetry-respecting, and universally approximating. This construction is presented as a general framework with applications to specific symmetries, supported by empirical validation. No quoted step reduces a claimed result to a fitted input, self-citation chain, or definitional tautology by construction. The central claims rest on a mathematical proof rather than renaming or smuggling assumptions, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Assessment is limited to the abstract; no explicit free parameters, axioms, or invented entities are stated. The central construction implicitly relies on the existence of a well-defined argmax over canonical forms that preserves symmetry and yields continuity, but the mathematical details and any supporting lemmas are not visible.

axioms (1)

domain assumption There exists a canonical form selection rule based on network confidence that is continuous and symmetry-preserving for the relevant group actions.
This premise is required for the claimed continuity and universal-approximation results but is not derived or justified in the provided abstract text.

pith-pipeline@v0.9.0 · 5709 in / 1408 out tokens · 58572 ms · 2026-05-18T12:41:37.255882+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 10 internal anchors

[1]

Geometric and physical quantities improve E(3) equivariant message passing.arXiv preprint arXiv:2110.02905,

Johannes Brandstetter, Rob Hesselink, Elise van der Pol, Erik J Bekkers, and Max Welling. Geometric and physical quantities improve E(3) equivariant message passing.arXiv preprint arXiv:2110.02905,

work page arXiv
[2]

Residual Gated Graph ConvNets

Xavier Bresson and Thomas Laurent. Residual gated graph ConvNets.arXiv preprint arXiv:1711.07553,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

Michael M Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges.arXiv preprint arXiv:2104.13478,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Steerable CNNs

Taco S Cohen and Max Welling. Steerable CNNs.arXiv preprint arXiv:1612.08498, 2016b. Lynn A Cooper and Roger N Shepard. Chronometric studies of the rotation of mental images. InVisual information processing, pages 75–176. Elsevier,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Metric convolutions: A unifying theory to adaptive convolutions.arXiv preprint arXiv:2406.05400,

Thomas Dagès, Michael Lindenbaum, and Alfred M Bruckstein. Metric convolutions: A unifying theory to adaptive convolutions.arXiv preprint arXiv:2406.05400,

work page arXiv
[6]

arXiv preprint arXiv:2312.07511 , year=

Alexandre Duval, Simon V Mathis, Chaitanya K Joshi, Victor Schmidt, Santiago Miret, Fragkiskos D Malliaros, Taco Cohen, Pietro Lio, Yoshua Bengio, and Michael Bronstein. A hitchhiker’s guide to geometric gnns for 3D atomic systems.arXiv preprint arXiv:2312.07511, 2023a. 12 Alexandre Agm Duval, Victor Schmidt, Alex Hernández-Garcıa, Santiago Miret, Fragkis...

work page arXiv
[7]

SE(3)-transformers: 3D roto-translation equivariant attention networks.Advances in neural information processing systems, 33:1970–1981,

Fabian Fuchs, Daniel Worrall, Volker Fischer, and Max Welling. SE(3)-transformers: 3D roto-translation equivariant attention networks.Advances in neural information processing systems, 33:1970–1981,

work page 1970
[8]

e3nn : E uclidean neural networks

Mario Geiger and Tess Smidt. e3nn: Euclidean neural networks.arXiv preprint arXiv:2207.09453,

work page arXiv
[9]

Geometrically equivariant graph neural networks: A survey

Jiaqi Han, Yu Rong, Tingyang Xu, and Wenbing Huang. Geometrically equivariant graph neural networks: A survey. arXiv preprint arXiv:2202.07230,

work page arXiv
[10]

Spectral graph neural networks are incomplete on graphs with a simple spectrum.arXiv preprint arXiv:2506.05530,

Snir Hordan, Maya Bechler-Speicher, Gur Lifshitz, and Nadav Dym. Spectral graph neural networks are incomplete on graphs with a simple spectrum.arXiv preprint arXiv:2506.05530,

work page arXiv
[11]

Ian T Jolliffe and Jorge Cadima. Principal component analysis: a review and recent developments.Philosophical transactions of the royal society A: Mathematical, Physical and Engineering Sciences, 374(2065):20150202,

work page 2065
[12]

Symmetry breaking and equivariant neural networks.arXiv preprint arXiv:2312.09016,

Sékou-Oumar Kaba and Siamak Ravanbakhsh. Symmetry breaking and equivariant neural networks.arXiv preprint arXiv:2312.09016,

work page arXiv
[13]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Semi-Supervised Classification with Graph Convolutional Networks

TN Kipf. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907,

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Improving equivariant networks with probabilistic symmetry breaking.arXiv preprint arXiv:2503.21985,

Hannah Lawrence, Vasco Portilheiro, Yan Zhang, and Sékou-Oumar Kaba. Improving equivariant networks with probabilistic symmetry breaking.arXiv preprint arXiv:2503.21985,

work page arXiv
[16]

Sign and basis invariant networks for spectral graph representation learning.arXiv preprint arXiv:2202.13013,

14 Derek Lim, Joshua Robinson, Lingxiao Zhao, Tess Smidt, Suvrit Sra, Haggai Maron, and Stefanie Jegelka. Sign and basis invariant networks for spectral graph representation learning.arXiv preprint arXiv:2202.13013,

work page arXiv
[17]

Equivariant machine learning on graphs with nonlinear spectral filters.Advances in Neural Information Processing Systems, 37:128182–128226, 2024a

Ya-Wei Eileen Lin, Ronen Talmon, and Ron Levie. Equivariant machine learning on graphs with nonlinear spectral filters.Advances in Neural Information Processing Systems, 37:128182–128226, 2024a. Yuchao Lin, Jacob Helwig, Shurui Gui, and Shuiwang Ji. Equivariance via minimal frame averaging for more symmetries and efficiency.arXiv preprint arXiv:2406.07598...

work page arXiv
[18]

Generalized Laplacian positional encoding for graph representation learning.arXiv preprint arXiv:2210.15956,

Sohir Maskey, Ali Parviz, Maximilian Thiessen, Hannes Stärk, Ylli Sadikaj, and Haggai Maron. Generalized Laplacian positional encoding for graph representation learning.arXiv preprint arXiv:2210.15956,

work page arXiv
[19]

Mezzadri, How to generate random matrices from the classical compact groups, arXiv preprint math- ph/0609050 (2006)

Francesco Mezzadri. How to generate random matrices from the classical compact groups.arXiv preprint math- ph/0609050,

work page arXiv
[20]

arXiv preprint arXiv:2007.08663 , year=

Christopher Morris, Nils M Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann. TUDataset: A collection of benchmark datasets for learning with graphs.arXiv preprint arXiv:2007.08663,

work page arXiv 2007
[21]

Learning symmetric embeddings for equivariant world models.arXiv preprint arXiv:2204.11371,

Jung Yeon Park, Ondrej Biza, Linfeng Zhao, Jan Willem van de Meent, and Robin Walters. Learning symmetric embeddings for equivariant world models.arXiv preprint arXiv:2204.11371,

work page arXiv
[22]

Global attention improves graph networks generalization.arXiv preprint arXiv:2006.07846,

Omri Puny, Heli Ben-Hamu, and Yaron Lipman. Global attention improves graph networks generalization.arXiv preprint arXiv:2006.07846,

work page arXiv 2006
[23]

Frame averaging for invariant and equivariant network design.arXiv preprint arXiv:2110.03336,

Omri Puny, Matan Atzmon, Heli Ben-Hamu, Ishan Misra, Aditya Grover, Edward J Smith, and Yaron Lipman. Frame averaging for invariant and equivariant network design.arXiv preprint arXiv:2110.03336,

work page arXiv
[24]

Symmetry-Aware Generative Modeling through Learned Canonicalization

Kusha Sareen, Daniel Levy, Arnab Kumar Mondal, Sékou-Oumar Kaba, Tara Akhound-Sadegh, and Siamak Ravan- bakhsh. Symmetry-aware generative modeling through learned canonicalization.arXiv preprint arXiv:2501.07773,

work page internal anchor Pith review Pith/arXiv arXiv
[25]

Robust canonicalization through bootstrapped data re-alignment.arXiv preprint arXiv:2510.08178,

Johann Schmidt and Sebastian Stober. Robust canonicalization through bootstrapped data re-alignment.arXiv preprint arXiv:2510.08178,

work page arXiv
[26]

The general theory of permutation equivarant neural networks and higher order graph variational encoders.arXiv preprint arXiv:2004.03990,

Erik Henning Thiede, Truong Son Hy, and Risi Kondor. The general theory of permutation equivarant neural networks and higher order graph variational encoders.arXiv preprint arXiv:2004.03990,

work page arXiv 2004
[27]

Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds

Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and Patrick Riley. Tensor field networks: Rotation-and translation-equivariant neural networks for 3D point clouds.arXiv preprint arXiv:1802.08219,

work page internal anchor Pith review Pith/arXiv arXiv
[28]

Graph Attention Networks

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks.arXiv preprint arXiv:1710.10903,

work page internal anchor Pith review Pith/arXiv arXiv
[29]

Discovering symmetry breaking in physical systems with relaxed group convolution.arXiv preprint arXiv:2310.02299,

Rui Wang, Elyssa Hofgard, Han Gao, Robin Walters, and Tess E Smidt. Discovering symmetry breaking in physical systems with relaxed group convolution.arXiv preprint arXiv:2310.02299,

work page arXiv
[30]

3D steerable CNNs: Learning rotationally equivariant features in volumetric data.Advances in Neural information processing systems, 31, 2018a

Maurice Weiler, Mario Geiger, Max Welling, Wouter Boomsma, and Taco S Cohen. 3D steerable CNNs: Learning rotationally equivariant features in volumetric data.Advances in Neural information processing systems, 31, 2018a. Maurice Weiler, Fred A Hamprecht, and Martin Storath. Learning steerable filters for rotation equivariant CNNs. In Proceedings of the IEE...

work page arXiv
[31]

3D ShapeNets: A deep representation for volumetric shapes

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3D ShapeNets: A deep representation for volumetric shapes. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920,

work page 1912
[32]

How Powerful are Graph Neural Networks?

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks?arXiv preprint arXiv:1810.00826,

work page internal anchor Pith review Pith/arXiv arXiv
[33]

Learning Representations of Sets through Optimized Permutations

Yan Zhang, Jonathon Hare, and Adam Prügel-Bennett. Learning representations of sets through optimized permuta- tions.arXiv preprint arXiv:1812.03928,

work page internal anchor Pith review Pith/arXiv arXiv
[34]

Fspool: Learning set representations with featurewise sort pooling.arXiv preprint arXiv:1906.02795,

Yan Zhang, Jonathon Hare, and Adam Prügel-Bennett. FSPool: Learning set representations with featurewise sort pooling.arXiv preprint arXiv:1906.02795, 2019a. Zhen Zhang, Jiajun Bu, Martin Ester, Jianfeng Zhang, Chengwei Yao, Zhi Yu, and Can Wang. Hierarchical graph pooling with structure learning.arXiv preprint arXiv:1911.05954, 2019b. 18 Appendix A Relat...

work page arXiv 1906
[35]

Their results show that the learned canonicalizers outperform fixed canonicalizers

developed a neural network that learns the canonicalization transformation, which enables plug-and-play equivariance, e.g., orthogonalizing learned features via the Gram- Schmidt process [Trefethen and Bau, 2022]. Their results show that the learned canonicalizers outperform fixed canonicalizers. However, Dym et al

work page 2022
[36]

average behavior,

is that when training is initialized, the canonicalizing energy s is random. This leads each datapoint to be randomly transformed, so the task neural network initially has to perform well at all orientations of the data. This can lead the task network to ultimately learn an “average behavior, ” not specializing in any special orientation but rather perfor...

work page 2023
[37]

We note that these approaches are rather different from our prior maximization method, and they do not try to address the continuity problem in canonicalization

iteratively reduces the orientation variance of the training set by iteratively reorienting datapoints that lead to a large loss. We note that these approaches are rather different from our prior maximization method, and they do not try to address the continuity problem in canonicalization. A.1.3 Weighted Canonicalization The energy-based canonicalization...

work page 2023
[38]

canonical

also discusses continuity preservation, but their approach is different from ours. In the work of Shumaylov et al. [2025], they define the notion of weighted canonicalization, which is a similar concept to the weighted frame introduced by Dym et al. [2024]. Here, to each datapoint there is an assigned probability measure over the orbit of the datapoint. N...

work page 2025
[39]

While it achieves strong empirical performance, its canonicalization mapping is not guaranteed to be continuous, and in fact, continuity is not discussed

does not involve network retraining, uses the foundation models as is, and performs canonicalization entirely at inference by optimizing over transformations. While it achieves strong empirical performance, its canonicalization mapping is not guaranteed to be continuous, and in fact, continuity is not discussed. 20 Therefore, small input changes may cause...

work page 2024
[40]

Recent work [Lin et al., 2024b] proposes minimal frame averaging that attains strong symmetry coverage with small frames

it avoids computational intractability of full group averaging, especially for large or continuous groups. Recent work [Lin et al., 2024b] proposes minimal frame averaging that attains strong symmetry coverage with small frames. Domain-specific frame averaging methods [Duval et al., 2023b, Atzmon et al., 2022] show that it can be deployed in material mode...

work page 2022
[41]

symmetries

the canonicalization is a function solely of the datapoint, and not the task network. Then, they define a variant of frame averaging, called weighted frame averaging, in which to each datapoint there is an associated probability distribution over the group, and the frame averaging is performed with respect to this measure. This construction yields continu...

work page 2023
[42]

Take f=1 I for a Borel set I⊂R

conjugate back. Take f=1 I for a Borel set I⊂R . The indicator function 1 I(L) is an orthogonal projection, since 1 I(L)2 =1 I(L) and1 I(L)∗ =1 I(L). 25 Algorithm 1Random maximization Input:Input g, backbone network f, scalar prior h(x), sampler Sample_U() for u∼P sampled from a probability measure overU, number of random samplesK, gradient descent stepGD...

work page 1966
[43]

Spectral graph neural networks [Defferrard et al., 2016b, Kipf, 2016, Levie et al., 2018] compose such filters with pointwise nonlinearities, using trainablegat each layer

with f:R→R, the spectral filter simply reduces to the functional-calculus operator acting onX: f(L)X= NX i=1 f(λ i)v iv⊤ i X=V f(Λ)V ⊤X. Spectral graph neural networks [Defferrard et al., 2016b, Kipf, 2016, Levie et al., 2018] compose such filters with pointwise nonlinearities, using trainablegat each layer. E Application of Adaptive Canonicalization: Tut...

work page 2016
[44]

prove the following concentration inequality for maxima. Lemma 22(Concentration inequality for volume retaining space [Cordonnier et al., 2024]).Let (X, P) be a probability space with the (r0, κ)-volume retaining property and let g:X 2 →R q be Kg-Lipschitz. For any ρ≥exp(−nκr d 02d), for any random variablesX 1, . . . , Xn i.i.d. ∼P, with probability at l...

work page 2024
[45]

However, these eigenvectors are not uniquely defined

E.3 Construction Details for Anisotropic Nonlinear Spectral Filters In spectral methods for graphs, we often use eigenvectors as a core component for graph representation learning. However, these eigenvectors are not uniquely defined. For each eigenvector we can flip its sign, and when an eigenvalue has multiplicity larger than one, any orthogonal basis o...

work page 2017
[46]

symmetry

DGCNN [Wang et al., 2019] constructs dynamic k-nearest graphs by computing G= (V, E) where E={(i, j) :j∈kNN(x i, k)}. Then, the edge convolution is performed by computing the edge features and applying a max pooling: x′ i = Pool(i,j)∈E(ReLU(Ψ(xj −x i,x i))). Applying adaptive canonicalization to the DGCNN architecture, we define a class-specific orientati...

work page 2019
[47]

Experiments are conducted on an Nvidia DGX A100

All models are implemented in PyTorch and optimized with the Adam optimizer [Kingma and Ba, 2014]. Experiments are conducted on an Nvidia DGX A100. The output of the GNN is then passed to an MLP, followed by a softmax classifier. F.2 Graph Classification on TUDataset Datasets and Experimental Setup.We consider five graph classification benchmarks from TUD...

work page 2014
[48]

Results are averaged over 10 random splits, with mean accuracy and standard deviation reported

Following the random split protocol [Ma et al., 2019, Ying et al., 2018, Zhang et al., 2019b], we partition the dataset into 80% training, 10% validation, and 10% testing. Results are averaged over 10 random splits, with mean accuracy and standard deviation reported. Competing Baselines.We evaluate on medium-scale graph classification benchmarks from TUDa...

work page 2019
[49]

An early stopping strategy is applied, where training halts if the validation loss does not improve for 100 consecutive epochs

The models are implemented using PyTorch, optimized with the Adam optimizer [Kingma and Ba, 2014]. An early stopping strategy is applied, where training halts if the validation loss does not improve for 100 consecutive epochs. The hyperparameters are selected through a grid search, conducted via Optuna [Akiba et al., 2019], with with the learning rate and...

work page 2014
[50]

Experiments are conducted on an Nvidia DGX A100

The output representations are then passed into an MLP followed by a softmax layer, and predictions are obtained by optimizing a cross-entropy loss function. Experiments are conducted on an Nvidia DGX A100. 32 F.3 Molecular Classification on OGB Datasets Datasets and Experimental Setup.We evaluate on larger-scale benchmarks from the Open Graph Benchmark (...

work page 2020
[51]

All hyperparameters are tuned using Optuna [Akiba et al., 2019]

Additionally, the batch size is chosen from {32,64,128,256} and the weight decay is chosen from {10−4,10 −5,10 −6}. All hyperparameters are tuned using Optuna [Akiba et al., 2019]. The experiments are conducted on an NVIDIA A100 GPU. F.4 ModelNet40 Point Cloud Classification Datasets and Experimental Setup.Our evaluation for point cloud classification was...

work page 2019
[52]

We attribute the slightly worse performance to the potential pooling loss

We see that the node-to-graph construction achieves performance closely aligned with, and in some cases approaching, that of the direct graph-level canonicalization. We attribute the slightly worse performance to the potential pooling loss. Table 6:Graph classification performance on TUDataset using adaptive canonicalization. Comparison between direct gra...

work page 2018
[53]

tion provided by dyadic bands, which could more effectively isolate band-wise unitary actions that commute with the chosen GSO

We see that using the dyadic partitions performs better than using the uniform partition. tion provided by dyadic bands, which could more effectively isolate band-wise unitary actions that commute with the chosen GSO. We also note that spectral band design can be realized in more flexible and expressive ways, for example, through attention as in SpecForme...

work page 2023
[54]

For grid size and sinusoidal period, performance remains stable across the tested ranges

Overall, we observe that our method is reasonably robust. For grid size and sinusoidal period, performance remains stable across the tested ranges. For the noise level, small to moderate noise leads to similar performance, with a degradation only when the noise becomes large enough that it effectively corrupts the underlying structure of the data. For the...

work page 2016
[55]

This implies that our method enables the model to adaptively select a canonical truncation that enhances downstream performance

We see that truncation-based prior maximization improves classification performance over the standard vanilla baseline. This implies that our method enables the model to adaptively select a canonical truncation that enhances downstream performance. In addition, we observe that the selected canonical crops tend to tightly focus on the main object while dis...

work page 2021

[1] [1]

Geometric and physical quantities improve E(3) equivariant message passing.arXiv preprint arXiv:2110.02905,

Johannes Brandstetter, Rob Hesselink, Elise van der Pol, Erik J Bekkers, and Max Welling. Geometric and physical quantities improve E(3) equivariant message passing.arXiv preprint arXiv:2110.02905,

work page arXiv

[2] [2]

Residual Gated Graph ConvNets

Xavier Bresson and Thomas Laurent. Residual gated graph ConvNets.arXiv preprint arXiv:1711.07553,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

Michael M Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges.arXiv preprint arXiv:2104.13478,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Steerable CNNs

Taco S Cohen and Max Welling. Steerable CNNs.arXiv preprint arXiv:1612.08498, 2016b. Lynn A Cooper and Roger N Shepard. Chronometric studies of the rotation of mental images. InVisual information processing, pages 75–176. Elsevier,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Metric convolutions: A unifying theory to adaptive convolutions.arXiv preprint arXiv:2406.05400,

Thomas Dagès, Michael Lindenbaum, and Alfred M Bruckstein. Metric convolutions: A unifying theory to adaptive convolutions.arXiv preprint arXiv:2406.05400,

work page arXiv

[6] [6]

arXiv preprint arXiv:2312.07511 , year=

Alexandre Duval, Simon V Mathis, Chaitanya K Joshi, Victor Schmidt, Santiago Miret, Fragkiskos D Malliaros, Taco Cohen, Pietro Lio, Yoshua Bengio, and Michael Bronstein. A hitchhiker’s guide to geometric gnns for 3D atomic systems.arXiv preprint arXiv:2312.07511, 2023a. 12 Alexandre Agm Duval, Victor Schmidt, Alex Hernández-Garcıa, Santiago Miret, Fragkis...

work page arXiv

[7] [7]

SE(3)-transformers: 3D roto-translation equivariant attention networks.Advances in neural information processing systems, 33:1970–1981,

Fabian Fuchs, Daniel Worrall, Volker Fischer, and Max Welling. SE(3)-transformers: 3D roto-translation equivariant attention networks.Advances in neural information processing systems, 33:1970–1981,

work page 1970

[8] [8]

e3nn : E uclidean neural networks

Mario Geiger and Tess Smidt. e3nn: Euclidean neural networks.arXiv preprint arXiv:2207.09453,

work page arXiv

[9] [9]

Geometrically equivariant graph neural networks: A survey

Jiaqi Han, Yu Rong, Tingyang Xu, and Wenbing Huang. Geometrically equivariant graph neural networks: A survey. arXiv preprint arXiv:2202.07230,

work page arXiv

[10] [10]

Spectral graph neural networks are incomplete on graphs with a simple spectrum.arXiv preprint arXiv:2506.05530,

Snir Hordan, Maya Bechler-Speicher, Gur Lifshitz, and Nadav Dym. Spectral graph neural networks are incomplete on graphs with a simple spectrum.arXiv preprint arXiv:2506.05530,

work page arXiv

[11] [11]

Ian T Jolliffe and Jorge Cadima. Principal component analysis: a review and recent developments.Philosophical transactions of the royal society A: Mathematical, Physical and Engineering Sciences, 374(2065):20150202,

work page 2065

[12] [12]

Symmetry breaking and equivariant neural networks.arXiv preprint arXiv:2312.09016,

Sékou-Oumar Kaba and Siamak Ravanbakhsh. Symmetry breaking and equivariant neural networks.arXiv preprint arXiv:2312.09016,

work page arXiv

[13] [13]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Semi-Supervised Classification with Graph Convolutional Networks

TN Kipf. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907,

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

Improving equivariant networks with probabilistic symmetry breaking.arXiv preprint arXiv:2503.21985,

Hannah Lawrence, Vasco Portilheiro, Yan Zhang, and Sékou-Oumar Kaba. Improving equivariant networks with probabilistic symmetry breaking.arXiv preprint arXiv:2503.21985,

work page arXiv

[16] [16]

Sign and basis invariant networks for spectral graph representation learning.arXiv preprint arXiv:2202.13013,

14 Derek Lim, Joshua Robinson, Lingxiao Zhao, Tess Smidt, Suvrit Sra, Haggai Maron, and Stefanie Jegelka. Sign and basis invariant networks for spectral graph representation learning.arXiv preprint arXiv:2202.13013,

work page arXiv

[17] [17]

Equivariant machine learning on graphs with nonlinear spectral filters.Advances in Neural Information Processing Systems, 37:128182–128226, 2024a

Ya-Wei Eileen Lin, Ronen Talmon, and Ron Levie. Equivariant machine learning on graphs with nonlinear spectral filters.Advances in Neural Information Processing Systems, 37:128182–128226, 2024a. Yuchao Lin, Jacob Helwig, Shurui Gui, and Shuiwang Ji. Equivariance via minimal frame averaging for more symmetries and efficiency.arXiv preprint arXiv:2406.07598...

work page arXiv

[18] [18]

Generalized Laplacian positional encoding for graph representation learning.arXiv preprint arXiv:2210.15956,

Sohir Maskey, Ali Parviz, Maximilian Thiessen, Hannes Stärk, Ylli Sadikaj, and Haggai Maron. Generalized Laplacian positional encoding for graph representation learning.arXiv preprint arXiv:2210.15956,

work page arXiv

[19] [19]

Mezzadri, How to generate random matrices from the classical compact groups, arXiv preprint math- ph/0609050 (2006)

Francesco Mezzadri. How to generate random matrices from the classical compact groups.arXiv preprint math- ph/0609050,

work page arXiv

[20] [20]

arXiv preprint arXiv:2007.08663 , year=

Christopher Morris, Nils M Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann. TUDataset: A collection of benchmark datasets for learning with graphs.arXiv preprint arXiv:2007.08663,

work page arXiv 2007

[21] [21]

Learning symmetric embeddings for equivariant world models.arXiv preprint arXiv:2204.11371,

Jung Yeon Park, Ondrej Biza, Linfeng Zhao, Jan Willem van de Meent, and Robin Walters. Learning symmetric embeddings for equivariant world models.arXiv preprint arXiv:2204.11371,

work page arXiv

[22] [22]

Global attention improves graph networks generalization.arXiv preprint arXiv:2006.07846,

Omri Puny, Heli Ben-Hamu, and Yaron Lipman. Global attention improves graph networks generalization.arXiv preprint arXiv:2006.07846,

work page arXiv 2006

[23] [23]

Frame averaging for invariant and equivariant network design.arXiv preprint arXiv:2110.03336,

Omri Puny, Matan Atzmon, Heli Ben-Hamu, Ishan Misra, Aditya Grover, Edward J Smith, and Yaron Lipman. Frame averaging for invariant and equivariant network design.arXiv preprint arXiv:2110.03336,

work page arXiv

[24] [24]

Symmetry-Aware Generative Modeling through Learned Canonicalization

Kusha Sareen, Daniel Levy, Arnab Kumar Mondal, Sékou-Oumar Kaba, Tara Akhound-Sadegh, and Siamak Ravan- bakhsh. Symmetry-aware generative modeling through learned canonicalization.arXiv preprint arXiv:2501.07773,

work page internal anchor Pith review Pith/arXiv arXiv

[25] [25]

Robust canonicalization through bootstrapped data re-alignment.arXiv preprint arXiv:2510.08178,

Johann Schmidt and Sebastian Stober. Robust canonicalization through bootstrapped data re-alignment.arXiv preprint arXiv:2510.08178,

work page arXiv

[26] [26]

The general theory of permutation equivarant neural networks and higher order graph variational encoders.arXiv preprint arXiv:2004.03990,

Erik Henning Thiede, Truong Son Hy, and Risi Kondor. The general theory of permutation equivarant neural networks and higher order graph variational encoders.arXiv preprint arXiv:2004.03990,

work page arXiv 2004

[27] [27]

Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds

Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and Patrick Riley. Tensor field networks: Rotation-and translation-equivariant neural networks for 3D point clouds.arXiv preprint arXiv:1802.08219,

work page internal anchor Pith review Pith/arXiv arXiv

[28] [28]

Graph Attention Networks

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks.arXiv preprint arXiv:1710.10903,

work page internal anchor Pith review Pith/arXiv arXiv

[29] [29]

Discovering symmetry breaking in physical systems with relaxed group convolution.arXiv preprint arXiv:2310.02299,

Rui Wang, Elyssa Hofgard, Han Gao, Robin Walters, and Tess E Smidt. Discovering symmetry breaking in physical systems with relaxed group convolution.arXiv preprint arXiv:2310.02299,

work page arXiv

[30] [30]

3D steerable CNNs: Learning rotationally equivariant features in volumetric data.Advances in Neural information processing systems, 31, 2018a

Maurice Weiler, Mario Geiger, Max Welling, Wouter Boomsma, and Taco S Cohen. 3D steerable CNNs: Learning rotationally equivariant features in volumetric data.Advances in Neural information processing systems, 31, 2018a. Maurice Weiler, Fred A Hamprecht, and Martin Storath. Learning steerable filters for rotation equivariant CNNs. In Proceedings of the IEE...

work page arXiv

[31] [31]

3D ShapeNets: A deep representation for volumetric shapes

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3D ShapeNets: A deep representation for volumetric shapes. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920,

work page 1912

[32] [32]

How Powerful are Graph Neural Networks?

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks?arXiv preprint arXiv:1810.00826,

work page internal anchor Pith review Pith/arXiv arXiv

[33] [33]

Learning Representations of Sets through Optimized Permutations

Yan Zhang, Jonathon Hare, and Adam Prügel-Bennett. Learning representations of sets through optimized permuta- tions.arXiv preprint arXiv:1812.03928,

work page internal anchor Pith review Pith/arXiv arXiv

[34] [34]

Fspool: Learning set representations with featurewise sort pooling.arXiv preprint arXiv:1906.02795,

Yan Zhang, Jonathon Hare, and Adam Prügel-Bennett. FSPool: Learning set representations with featurewise sort pooling.arXiv preprint arXiv:1906.02795, 2019a. Zhen Zhang, Jiajun Bu, Martin Ester, Jianfeng Zhang, Chengwei Yao, Zhi Yu, and Can Wang. Hierarchical graph pooling with structure learning.arXiv preprint arXiv:1911.05954, 2019b. 18 Appendix A Relat...

work page arXiv 1906

[35] [35]

Their results show that the learned canonicalizers outperform fixed canonicalizers

developed a neural network that learns the canonicalization transformation, which enables plug-and-play equivariance, e.g., orthogonalizing learned features via the Gram- Schmidt process [Trefethen and Bau, 2022]. Their results show that the learned canonicalizers outperform fixed canonicalizers. However, Dym et al

work page 2022

[36] [36]

average behavior,

is that when training is initialized, the canonicalizing energy s is random. This leads each datapoint to be randomly transformed, so the task neural network initially has to perform well at all orientations of the data. This can lead the task network to ultimately learn an “average behavior, ” not specializing in any special orientation but rather perfor...

work page 2023

[37] [37]

We note that these approaches are rather different from our prior maximization method, and they do not try to address the continuity problem in canonicalization

iteratively reduces the orientation variance of the training set by iteratively reorienting datapoints that lead to a large loss. We note that these approaches are rather different from our prior maximization method, and they do not try to address the continuity problem in canonicalization. A.1.3 Weighted Canonicalization The energy-based canonicalization...

work page 2023

[38] [38]

canonical

also discusses continuity preservation, but their approach is different from ours. In the work of Shumaylov et al. [2025], they define the notion of weighted canonicalization, which is a similar concept to the weighted frame introduced by Dym et al. [2024]. Here, to each datapoint there is an assigned probability measure over the orbit of the datapoint. N...

work page 2025

[39] [39]

While it achieves strong empirical performance, its canonicalization mapping is not guaranteed to be continuous, and in fact, continuity is not discussed

does not involve network retraining, uses the foundation models as is, and performs canonicalization entirely at inference by optimizing over transformations. While it achieves strong empirical performance, its canonicalization mapping is not guaranteed to be continuous, and in fact, continuity is not discussed. 20 Therefore, small input changes may cause...

work page 2024

[40] [40]

Recent work [Lin et al., 2024b] proposes minimal frame averaging that attains strong symmetry coverage with small frames

it avoids computational intractability of full group averaging, especially for large or continuous groups. Recent work [Lin et al., 2024b] proposes minimal frame averaging that attains strong symmetry coverage with small frames. Domain-specific frame averaging methods [Duval et al., 2023b, Atzmon et al., 2022] show that it can be deployed in material mode...

work page 2022

[41] [41]

symmetries

the canonicalization is a function solely of the datapoint, and not the task network. Then, they define a variant of frame averaging, called weighted frame averaging, in which to each datapoint there is an associated probability distribution over the group, and the frame averaging is performed with respect to this measure. This construction yields continu...

work page 2023

[42] [42]

Take f=1 I for a Borel set I⊂R

conjugate back. Take f=1 I for a Borel set I⊂R . The indicator function 1 I(L) is an orthogonal projection, since 1 I(L)2 =1 I(L) and1 I(L)∗ =1 I(L). 25 Algorithm 1Random maximization Input:Input g, backbone network f, scalar prior h(x), sampler Sample_U() for u∼P sampled from a probability measure overU, number of random samplesK, gradient descent stepGD...

work page 1966

[43] [43]

Spectral graph neural networks [Defferrard et al., 2016b, Kipf, 2016, Levie et al., 2018] compose such filters with pointwise nonlinearities, using trainablegat each layer

with f:R→R, the spectral filter simply reduces to the functional-calculus operator acting onX: f(L)X= NX i=1 f(λ i)v iv⊤ i X=V f(Λ)V ⊤X. Spectral graph neural networks [Defferrard et al., 2016b, Kipf, 2016, Levie et al., 2018] compose such filters with pointwise nonlinearities, using trainablegat each layer. E Application of Adaptive Canonicalization: Tut...

work page 2016

[44] [44]

prove the following concentration inequality for maxima. Lemma 22(Concentration inequality for volume retaining space [Cordonnier et al., 2024]).Let (X, P) be a probability space with the (r0, κ)-volume retaining property and let g:X 2 →R q be Kg-Lipschitz. For any ρ≥exp(−nκr d 02d), for any random variablesX 1, . . . , Xn i.i.d. ∼P, with probability at l...

work page 2024

[45] [45]

However, these eigenvectors are not uniquely defined

E.3 Construction Details for Anisotropic Nonlinear Spectral Filters In spectral methods for graphs, we often use eigenvectors as a core component for graph representation learning. However, these eigenvectors are not uniquely defined. For each eigenvector we can flip its sign, and when an eigenvalue has multiplicity larger than one, any orthogonal basis o...

work page 2017

[46] [46]

symmetry

DGCNN [Wang et al., 2019] constructs dynamic k-nearest graphs by computing G= (V, E) where E={(i, j) :j∈kNN(x i, k)}. Then, the edge convolution is performed by computing the edge features and applying a max pooling: x′ i = Pool(i,j)∈E(ReLU(Ψ(xj −x i,x i))). Applying adaptive canonicalization to the DGCNN architecture, we define a class-specific orientati...

work page 2019

[47] [47]

Experiments are conducted on an Nvidia DGX A100

All models are implemented in PyTorch and optimized with the Adam optimizer [Kingma and Ba, 2014]. Experiments are conducted on an Nvidia DGX A100. The output of the GNN is then passed to an MLP, followed by a softmax classifier. F.2 Graph Classification on TUDataset Datasets and Experimental Setup.We consider five graph classification benchmarks from TUD...

work page 2014

[48] [48]

Results are averaged over 10 random splits, with mean accuracy and standard deviation reported

Following the random split protocol [Ma et al., 2019, Ying et al., 2018, Zhang et al., 2019b], we partition the dataset into 80% training, 10% validation, and 10% testing. Results are averaged over 10 random splits, with mean accuracy and standard deviation reported. Competing Baselines.We evaluate on medium-scale graph classification benchmarks from TUDa...

work page 2019

[49] [49]

An early stopping strategy is applied, where training halts if the validation loss does not improve for 100 consecutive epochs

The models are implemented using PyTorch, optimized with the Adam optimizer [Kingma and Ba, 2014]. An early stopping strategy is applied, where training halts if the validation loss does not improve for 100 consecutive epochs. The hyperparameters are selected through a grid search, conducted via Optuna [Akiba et al., 2019], with with the learning rate and...

work page 2014

[50] [50]

Experiments are conducted on an Nvidia DGX A100

The output representations are then passed into an MLP followed by a softmax layer, and predictions are obtained by optimizing a cross-entropy loss function. Experiments are conducted on an Nvidia DGX A100. 32 F.3 Molecular Classification on OGB Datasets Datasets and Experimental Setup.We evaluate on larger-scale benchmarks from the Open Graph Benchmark (...

work page 2020

[51] [51]

All hyperparameters are tuned using Optuna [Akiba et al., 2019]

Additionally, the batch size is chosen from {32,64,128,256} and the weight decay is chosen from {10−4,10 −5,10 −6}. All hyperparameters are tuned using Optuna [Akiba et al., 2019]. The experiments are conducted on an NVIDIA A100 GPU. F.4 ModelNet40 Point Cloud Classification Datasets and Experimental Setup.Our evaluation for point cloud classification was...

work page 2019

[52] [52]

We attribute the slightly worse performance to the potential pooling loss

We see that the node-to-graph construction achieves performance closely aligned with, and in some cases approaching, that of the direct graph-level canonicalization. We attribute the slightly worse performance to the potential pooling loss. Table 6:Graph classification performance on TUDataset using adaptive canonicalization. Comparison between direct gra...

work page 2018

[53] [53]

tion provided by dyadic bands, which could more effectively isolate band-wise unitary actions that commute with the chosen GSO

We see that using the dyadic partitions performs better than using the uniform partition. tion provided by dyadic bands, which could more effectively isolate band-wise unitary actions that commute with the chosen GSO. We also note that spectral band design can be realized in more flexible and expressive ways, for example, through attention as in SpecForme...

work page 2023

[54] [54]

For grid size and sinusoidal period, performance remains stable across the tested ranges

Overall, we observe that our method is reasonably robust. For grid size and sinusoidal period, performance remains stable across the tested ranges. For the noise level, small to moderate noise leads to similar performance, with a degradation only when the noise becomes large enough that it effectively corrupts the underlying structure of the data. For the...

work page 2016

[55] [55]

This implies that our method enables the model to adaptively select a canonical truncation that enhances downstream performance

We see that truncation-based prior maximization improves classification performance over the standard vanilla baseline. This implies that our method enables the model to adaptively select a canonical truncation that enhances downstream performance. In addition, we observe that the selected canonical crops tend to tightly focus on the main object while dis...

work page 2021