Recognition: 1 theorem link
· Lean TheoremWhen and How to Canonize: A Generalization Perspective
Pith reviewed 2026-05-13 00:49 UTC · model grok-4.3
The pith
Canonized models achieve generalization bounds that range from matching the best invariant architectures to matching non-invariant baselines, depending on the regularity of the canonization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce a theoretical framework to analyze the generalization error of canonization, group averaging, and invariant architectures by bounding their covering numbers. We establish that the error bounds of canonized models are at best equal to those of structurally invariant and group-averaged models, and at worst equal to those of non-invariant baselines, with the position in this hierarchy determined by the regularity of the canonization map. For permutation groups, we prove that lexicographical sorting leads to exponential growth in covering numbers with dimension, while Hilbert curve canonization ensures polynomial growth.
What carries the argument
Covering-number bounds on the hypothesis classes induced by each symmetry-handling method, controlled by a regularity condition on the canonization map.
If this is right
- Optimal canonizations exist that attain the same generalization bounds as fully invariant and group-averaged models.
- Irregular canonizations can achieve no better generalization than non-invariant baselines.
- For permutation-invariant point-cloud tasks, Hilbert-curve canonization guarantees polynomial covering-number growth while lexicographic sorting guarantees exponential growth.
- Regularity of a canonization map determines whether it attains the upper or lower end of the generalization hierarchy.
Where Pith is reading between the lines
- The regularity condition could serve as a design criterion when creating canonization procedures for symmetries other than permutations.
- The same covering-number comparison might be used to evaluate canonization against equivariant layers for rotation or scale symmetries in images or graphs.
- If a chosen canonization is only partially regular, a hybrid architecture that adds limited invariance layers after canonization may recover some of the lost generalization performance.
Load-bearing premise
The covering-number bounds derived for the canonized function class translate directly into generalization guarantees and the regularity condition on the canonization map can be verified or controlled in practice.
What would settle it
A direct computation or experiment on a concrete canonization that produces a covering number lying outside the predicted range relative to the invariant and non-invariant baselines.
Figures
read the original abstract
While invariant architectures are standard for processing symmetric data, there is growing interest in achieving invariance by applying group averaging or canonization to non-invariant backbones. However, the theoretical generalization properties of these alternative strategies remain poorly understood. We introduce a theoretical framework to analyze the generalization error of these methods by bounding their covering numbers. We establish a rigorous generalization hierarchy: the error bounds of canonized models are at best equal to the error bounds of structurally invariant and group-averaged models, and at worst equal to the bounds of non-invariant baselines. Furthermore, we show that there exist optimal canonizations which attain the optimal error bounds, and poor canonizations which attain the non-invariant error bounds, and that this depends on the regularity of the canonization. Finally, applying this framework to permutation groups in point cloud processing, we rigorously prove that the covering number of lexicographical sorting grows exponentially with point cloud dimension, whereas Hilbert curve canonization guarantees polynomial growth. This provides the first formal theoretical justification for the empirical success of Hilbert curve serialization in state-of-the-art point cloud architectures. We conclude with experiments that support our theoretical claims. Code is available at https://github.com/yonatansverdlov/Canonization
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a covering-number-based framework to compare generalization bounds for canonized models (F_c = {f ∘ c | f ∈ F}), structurally invariant models, group-averaged models, and non-invariant baselines. It claims a rigorous hierarchy in which the covering number (and thus generalization bound) of a canonized class lies between the invariant and non-invariant extremes, with the position controlled by a regularity property of the canonization map c. The authors prove existence of optimal canonizations attaining the best bounds and poor ones attaining the worst, and apply the framework to permutation groups on point clouds, showing that lexicographic sorting yields exponential covering-number growth in dimension while Hilbert-curve canonization yields polynomial growth. Experiments and open-source code are provided to support the claims.
Significance. If the hierarchy and the transfer from canonization regularity to composed covering numbers hold, the work supplies the first formal justification for preferring certain canonization procedures (e.g., Hilbert curves) over others in symmetric-data settings and gives practitioners a concrete criterion for selecting canonization maps. The explicit comparison with group averaging and built-in invariance, together with the reproducible code, strengthens the contribution.
major comments (2)
- [§3] §3 (regularity definition and Theorem on covering-number interpolation): the claimed bound N(F_c, ε) ≤ N(F, ε) (the “at worst equal” direction) appears to rest on the regularity condition alone. It is not shown that regularity implies a uniform modulus of continuity sufficient to contract the ε-net size of the composed class; if regularity only guarantees continuity or a dimension-dependent Lipschitz constant, N(F_c, ε) can exceed N(F, ε), breaking the hierarchy.
- [§5] §5 (point-cloud application, lex-sort and Hilbert-curve theorems): the exponential/polynomial covering-number statements are proved for the canonization map c itself. No explicit composition lemma relating the modulus of c to the entropy of the neural-network class F is provided, so the transfer to generalization bounds for the full model class F_c is not immediate.
minor comments (2)
- [Notation and §3] The notation for covering numbers and the precise statement of the regularity condition should be cross-referenced to standard texts (e.g., Shalev-Shwartz & Ben-David) for clarity.
- [Experiments] Figure captions and experimental details could more explicitly link the plotted quantities back to the covering-number bounds derived in the theory.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for highlighting these potential gaps in the presentation of our covering-number hierarchy. We address each major comment below and will revise the manuscript accordingly to strengthen the rigor of the arguments.
read point-by-point responses
-
Referee: [§3] §3 (regularity definition and Theorem on covering-number interpolation): the claimed bound N(F_c, ε) ≤ N(F, ε) (the “at worst equal” direction) appears to rest on the regularity condition alone. It is not shown that regularity implies a uniform modulus of continuity sufficient to contract the ε-net size of the composed class; if regularity only guarantees continuity or a dimension-dependent Lipschitz constant, N(F_c, ε) can exceed N(F, ε), breaking the hierarchy.
Authors: We agree that the current write-up of the interpolation theorem in §3 would benefit from an explicit supporting lemma. The regularity condition we introduce is defined via a uniform modulus of continuity (independent of dimension in the relevant regimes) that directly controls the distortion of ε-nets under composition with c. Nevertheless, we will add a short lemma that derives the contraction N(F_c, ε) ≤ N(F, ε) from this modulus using standard covering-number composition inequalities, thereby making the “at worst equal” direction fully rigorous. revision: yes
-
Referee: [§5] §5 (point-cloud application, lex-sort and Hilbert-curve theorems): the exponential/polynomial covering-number statements are proved for the canonization map c itself. No explicit composition lemma relating the modulus of c to the entropy of the neural-network class F is provided, so the transfer to generalization bounds for the full model class F_c is not immediate.
Authors: We acknowledge that an explicit composition step is missing from the current text. In the revised version we will insert a dedicated lemma (placed before the point-cloud theorems) that composes the entropy of c with the entropy of a standard neural-network class F (assumed to have bounded Lipschitz constant, as is common for generalization analyses). This lemma will show that polynomial growth of the covering number of c implies polynomial growth for F_c, while exponential growth for c yields exponential growth for F_c, thereby transferring the Hilbert-curve and lexicographic results directly to the generalization bounds of the composed model class. revision: yes
Circularity Check
No circularity: standard covering-number bounds applied to composed canonized classes
full rationale
The paper derives its generalization hierarchy by applying classical covering-number arguments from statistical learning theory to the canonized hypothesis class F_c = {f ∘ c | f ∈ F}. The claimed ordering of error bounds follows directly from how a regularity condition on the canonization map c modulates the entropy of F_c relative to the structurally invariant and non-invariant baselines; this relation is established via explicit lemmas and growth-rate calculations (exponential for lexicographic sorting, polynomial for Hilbert curves) rather than by redefining the target quantities in terms of the bounds themselves. No parameters are fitted to the generalization claims, no self-citation chain supplies the central uniqueness or interpolation result, and the covering-number bounds are not tautological with the input function classes. The framework is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Covering numbers control generalization error in the standard way for supervised learning.
- domain assumption The input distribution is invariant under the symmetry group (permutations for point clouds).
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We establish a rigorous generalization hierarchy: the error bounds of canonized models are at best equal to the error bounds of structurally invariant and group-averaged models, and at worst equal to the bounds of non-invariant baselines... by bounding their covering numbers.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Johansson, Oleksii Prykhodko, Esben J
Josep Arús-Pous, Simon V . Johansson, Oleksii Prykhodko, Esben J. Bjerrum, Christian Tyrchan, Jean-Louis Reymond, Hongming Chen, and Ola Engkvist. Randomized SMILES strings improve the quality of molecular generative models.Journal of Cheminformatics, 11(1):71, 2019
work page 2019
-
[2]
An explicit frame construc- tion for normalizing 3D point clouds
Justin Baker, Shih-Hsin Wang, Tommaso De Fernex, and Bao Wang. An explicit frame construc- tion for normalizing 3D point clouds. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings...
work page 2024
-
[3]
Esben Jannik Bjerrum. Smiles enumeration as data augmentation for neural network modeling of molecules.ArXiv, abs/1703.07076, 2017
-
[4]
American Mathematical Society, 2019
Sergey Bobkov and Michel Ledoux.One-dimensional empirical measures, order statistics, and Kantorovich transport distances, volume 261. American Mathematical Society, 2019
work page 2019
-
[5]
Does equivariance matter at scale?, 2025
Johann Brehmer, Sönke Behrends, Pim de Haan, and Taco Cohen. Does equivariance matter at scale?, 2025
work page 2025
-
[6]
Efficient point cloud analysis using hilbert curve
Wanli Chen, Xinge Zhu, Guojin Chen, and Bei Yu. Efficient point cloud analysis using hilbert curve. In Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner, editors,Computer Vision – ECCV 2022, pages 730–747, Cham, 2022. Springer Nature Switzerland
work page 2022
-
[7]
Equivariant frames and the impossibil- ity of continuous canonicalization
Nadav Dym, Hannah Lawrence, and Jonathan W Siegel. Equivariant frames and the impossibil- ity of continuous canonicalization. InInternational Conference on Machine Learning, pages 12228–12267. PMLR, 2024
work page 2024
-
[8]
Provably strict generalisation benefit for invariance in kernel methods
Bryn Elesedy. Provably strict generalisation benefit for invariance in kernel methods. In A. Beygelzimer, Y . Dauphin, P. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, 2021
work page 2021
-
[9]
Weisfeiler-leman at the margin: when more expressivity matters
Billy J Franks, Christopher Morris, Ameya Velingker, and Floris Geerts. Weisfeiler-leman at the margin: when more expressivity matters. InProceedings of the 41st International Conference on Machine Learning, pages 13885–13926, 2024
work page 2024
-
[10]
Canonnet: Canonical ordering and curvature learning for point cloud analysis, 2025
Benjy Friedmann and Michael Werman. Canonnet: Canonical ordering and curvature learning for point cloud analysis, 2025
work page 2025
-
[11]
Extensible grids: uniform sampling on a space filling curve
Zhijian He and Art B Owen. Extensible grids: uniform sampling on a space filling curve. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(4):917–931, 2016
work page 2016
-
[12]
Snir Hordan, Maya Bechler-Speicher, Gur Lifshitz, and Nadav Dym. Spectral graph neural networks are incomplete on graphs with a simple spectrum.arXiv preprint arXiv:2506.05530, 2025
-
[13]
Random search neural networks for efficient and expressive graph learning
Michael Ito, Danai Koutra, and Jenna Wiens. Random search neural networks for efficient and expressive graph learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026
work page 2026
-
[14]
Equivariance with learned canonicalization functions
Sékou-Oumar Kaba, Arnab Kumar Mondal, Yan Zhang, Yoshua Bengio, and Siamak Ravan- bakhsh. Equivariance with learned canonicalization functions. InInternational Conference on Machine Learning, pages 15546–15566. PMLR, 2023
work page 2023
-
[15]
An empirical evaluation of deep architectures on problems with many factors of variation
Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th international conference on Machine learning, pages 473–480, 2007
work page 2007
-
[16]
Adaptive canonicalization with application to invariant anisotropic geometric networks
Ya-Wei Eileen Lin and Ron Levie. Adaptive canonicalization with application to invariant anisotropic geometric networks. InThe Fourteenth International Conference on Learning Representations, 2026. 10
work page 2026
-
[17]
Equivariance via minimal frame averaging for more symmetries and efficiency
Yuchao Lin, Jacob Helwig, Shurui Gui, and Shuiwang Ji. Equivariance via minimal frame averaging for more symmetries and efficiency. InForty-first International Conference on Machine Learning, 2024
work page 2024
-
[18]
A canonicalization perspective on invariant and equivariant learning
George Ma, Yifei Wang, Derek Lim, Stefanie Jegelka, and Yisen Wang. A canonicalization perspective on invariant and equivariant learning. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
work page 2024
-
[19]
George Ma, Yifei Wang, Derek Lim, Stefanie Jegelka, and Yisen Wang. A canonicalization perspective on invariant and equivariant learning.Advances in Neural Information Processing Systems, 37:60936–60979, 2024
work page 2024
-
[20]
Laplacian canonization: A minimalist approach to sign and basis invariant spectral embedding
George Ma, Yifei Wang, and Yisen Wang. Laplacian canonization: A minimalist approach to sign and basis invariant spectral embedding. InThirty-seventh Conference on Neural Information Processing Systems, 2023
work page 2023
-
[21]
Sohir Maskey, Gitta Kutyniok, and Ron Levie. Generalization bounds for message passing networks on mixture of graphons.SIAM Journal on Mathematics of Data Science, 7(2):802–825, 2025
work page 2025
-
[22]
Approximation-generalization trade-offs under (ap- proximate) group equivariance
Mircea Petrache and Shubhendu Trivedi. Approximation-generalization trade-offs under (ap- proximate) group equivariance. InThirty-seventh Conference on Neural Information Processing Systems, 2023
work page 2023
-
[23]
arXiv preprint arXiv:2110.03336 , year=
Omri Puny, Matan Atzmon, Heli Ben-Hamu, Ishan Misra, Aditya Grover, Edward J Smith, and Yaron Lipman. Frame averaging for invariant and equivariant network design.arXiv preprint arXiv:2110.03336, 2021
-
[24]
Siegel, Snir Hordan, Hannah Lawrence, Ali Syed, and Nadav Dym
Jonathan W. Siegel, Snir Hordan, Hannah Lawrence, Ali Syed, and Nadav Dym. Quantitative approximation rates for group equivariant learning, 2026
work page 2026
-
[25]
Enumerative combinatorics volume 1 second edition.Cambridge studies in advanced mathematics, 2011
Richard P Stanley. Enumerative combinatorics volume 1 second edition.Cambridge studies in advanced mathematics, 2011
work page 2011
-
[26]
Generalization bounds for canonicalization: A comparative study with group averaging
Behrooz Tahmasebi and Stefanie Jegelka. Generalization bounds for canonicalization: A comparative study with group averaging. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[27]
Regularity in canonicalized models: A theoretical perspective
Behrooz Tahmasebi and Stefanie Jegelka. Regularity in canonicalized models: A theoretical perspective. InInternational Conference on Artificial Intelligence and Statistics, pages 4789–
-
[28]
Survey on generaliza- tion theory for graph neural networks.arXiv preprint arXiv:2503.15650, 2025
Antonis Vasileiou, Stefanie Jegelka, Ron Levie, and Christopher Morris. Survey on generaliza- tion theory for graph neural networks.arXiv preprint arXiv:2503.15650, 2025
- [29]
-
[30]
Point transformer v3: Simpler, faster, stronger
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point transformer v3: Simpler, faster, stronger. InCVPR, 2024
work page 2024
-
[31]
3d shapenets: A deep representation for volumetric shapes
Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920, 2015
work page 1912
-
[32]
Robustness and generalization.Machine learning, 86(3):391–423, 2012
Huan Xu and Shie Mannor. Robustness and generalization.Machine learning, 86(3):391–423, 2012
work page 2012
-
[33]
Deep sets.Advances in neural information processing systems, 30, 2017
Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and Alexander J Smola. Deep sets.Advances in neural information processing systems, 30, 2017
work page 2017
-
[34]
Cai Zhou, Zijie Chen, Zian Li, Jike Wang, Kaiyi Jiang, Pan Li, Rose Yu, Muhan Zhang, Stephen Bates, and Tommi Jaakkola. Rethinking diffusion models with symmetries through canonicalization with applications to molecular graph generation, 2026. 11 A Proofs In this section, we provide the proofs that are missing in the main text. A.1 Generalization and Cove...
work page 2026
-
[35]
The canon.c(x) =|x|with respect to the action of{−1,1}onRand the standard metric
-
[36]
sort :R n →R n with respect to the action of permutations and a metric ρ induced by apnorm
The canon. sort :R n →R n with respect to the action of permutations and a metric ρ induced by apnorm
-
[37]
The centralization canon. defined for point clouds in Rd×n with respect to the action of translation by vectors inR d, with the metricρinduced by the Frobenius norm. Proof.The canonizationc(x) =|x|is an isometry as ∀x, y∈K, ρ G([x],[y]) = min{|x−y|,|x+y|}=||x| − |y||=|c(x)−c(y)| The fact that sorting is an isometry with respect to the metric ρG induced fr...
-
[38]
At k= 16 , where validation AP for random_augmented continues to improve well past 200 epochs, all three arms are trained for500 epochs with early stopping disabled (patience999). Tables 5 and 6 therefore report a matched-compute comparison at every (k, h) cell. Both schedules are released in the project repository. Results.Table 5 reports mean test AP at...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.