Axiomatizing Neural Networks via Pursuit of Subspaces

Felix Rojas Casadiego; Marcel van Gerven; Mehmet Yamac; Mert Duman; Moncef Gabbouj; Serkan Kiranyaz; Ugur Akpinar

arxiv: 2605.20534 · v1 · pith:I7OWZ4ORnew · submitted 2026-05-19 · 💻 cs.LG · cs.AI· stat.ML

Axiomatizing Neural Networks via Pursuit of Subspaces

Mehmet Yamac , Mert Duman , Ugur Akpinar , Felix Rojas Casadiego , Serkan Kiranyaz , Marcel van Gerven , Moncef Gabbouj This is my paper

Pith reviewed 2026-05-21 06:48 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords neural networksaxiomatic frameworksubspace pursuitdeep learning theorygeometric postulatesrepresentation learninggeneralization

0 comments

The pith

Neural networks operate according to geometric postulates about pursuing subspaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces the Pursuit of Subspaces hypothesis as an axiomatic framework that models neural network behavior through a set of geometric postulates. It aims to explain representation, computation, and generalization in both shallow and deep networks as consequences of these rules, much like axioms clarify the properties of classical geometry. A reader would care if this holds because it could replace black-box views with a principled geometric account that addresses why networks succeed and how they generalize. The approach unifies explanations across architectures by deriving observable behaviors directly from the postulates.

Core claim

The PoS axioms together with their derived consequences provide a unified perspective on representation, computation, and generalization in both shallow and deep architectures and yield geometric explanations for fundamental questions in deep learning.

What carries the argument

The Pursuit of Subspaces (PoS) hypothesis, a collection of geometric postulates that treat network dynamics as the systematic pursuit of subspaces.

If this is right

The framework supplies geometric accounts of how representations form in network layers.
It explains the effects of architectural choices such as depth and width through subspace mechanisms.
Generalization behavior follows as a direct consequence of the geometric postulates rather than separate statistical arguments.
Both shallow linear networks and deep nonlinear ones fall under the same set of axioms and derived results.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the axioms prove faithful, designers could use them to derive new architectures instead of relying on empirical search.
The approach may connect to existing geometric ideas in machine learning such as manifold assumptions without requiring additional machinery.
A direct test would involve checking whether subspace-pursuit predictions match the internal activations of trained networks on simple datasets.
The same postulates might extend to explain certain behaviors in other parameterized models beyond standard neural networks.

Load-bearing premise

Neural network behavior can be captured by a small set of geometric postulates whose consequences are both non-trivial and faithful to observed network dynamics.

What would settle it

A clear case of network training or generalization where the observed representations and performance cannot be derived from or predicted by the PoS axioms.

Figures

Figures reproduced from arXiv: 2605.20534 by Felix Rojas Casadiego, Marcel van Gerven, Mehmet Yamac, Mert Duman, Moncef Gabbouj, Serkan Kiranyaz, Ugur Akpinar.

**Figure 1.** Figure 1: An example manifold and local coordinates. Beyond these basic examples, new manifolds can be obtained by forming Cartesian products, which are known as product manifolds. If M1 and M2 are manifolds of dimensions k1 and k2, then their product M = M1 × M2 is itself a manifold of dimension k1 + k2. The local charts of M are given by combining charts from M1 and M2. A fundamental example is the n-torus, defi… view at source ↗

**Figure 2.** Figure 2: Transversal intersections. (a) Linear subspaces. (b) Curved submanifolds. Let F : X → Y be a smooth map between manifolds, and let Z ⊂ Y be a smooth submanifold. We say that F is transversal to Z, written F ⋔ Z, if for every point x ∈ X with F(x) ∈ Z, we have Im(dFx) + TF (x)Z = TF (x)Y. (3) If this condition holds, then the preimage F −1 (Z) is a smooth submanifold of X . Moreover, the codimension of F −… view at source ↗

**Figure 3.** Figure 3: Nonlinear orthogonal projection onto a manifold. (a) [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Uniqueness vs. stability. (a) Null-space [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Sparsity models. (a) Conventional sparsity. (b) Group sparsity. The recovery guarantees for k-sparse signals require controlling the kmax-order constant (equal to 2k in the classical setting [7]), denoted δkmax (D). The intuition parallels the null-space analysis above: to avoid Dx′ = Dx′′, we require D(x ′ − x ′′) ̸= 0 for all distinct k-sparse vectors, which in turn requires that D not annihilate any no… view at source ↗

**Figure 6.** Figure 6: Autoencoder as a geometric coordinate model. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Classical vs. PoS-based views of neural net [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Compact vs. non-compact representations. Linear models learn a single global span, which [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: Vanilla Networks vs. Skip Connections. When MD is a single smooth manifold, the operator P ⊥ extracts components orthogonal to the tangent space TP (s)MD, i.e., elements of the normal space NP (s)MD. In the special case where MD is a flat k-dimensional subspace, this reduces to the classical null–space projection onto the orthogonal complement span(D) ⊥, so that P(s) = s − P ⊥(s) is simply the standard o… view at source ↗

**Figure 10.** Figure 10: Projection on sub-manifold MDj can be realized via its own encoder-decoder or via isometric mapping from MDj to MDi . Remark 1 (Isometry Invariance of Nonlinear Orthogonal Projection). (See [19].) Let M be a nonempty subset of the Euclidean space R n, not necessarily a manifold; in particular, M may be a single smooth submanifold or a union of such submanifolds. Let T : R n → R n be an isometry. If P : R… view at source ↗

**Figure 11.** Figure 11: Isometry action on a union of submanifolds. Left: a finite union M = SL i=1 MDi , where MD1 is already learned as a canonical component. Middle: new samples (red points) do not lie on MD1 but are assumed to belong to an isometric image g(MD1 ). Learning g −1 maps these samples back onto MD1 , implicitly identifying the transformed manifold g(MD1 ). Right: by the invariance identity Pg(M) = g ◦ PM ◦ g −… view at source ↗

**Figure 12.** Figure 12: Geometric disentanglement of two submanifolds. Left: The tangent space at the intersection Tx(MDi ∩ MDj ) splits into submanifold-specific residual directions TxMDi,R and TxMDj ,R, which a trained network pushes to be as orthogonal as possible to ensure stable, unambiguous projection. Right: Linearized view where tangent spaces are approximated by subspace spans span(Di), linking the geometric decomp… view at source ↗

**Figure 13.** Figure 13: Geometric illustration of residual-nullspace interference and selective annihilation. [PITH_FULL_IMAGE:figures/full_fig_p015_13.png] view at source ↗

**Figure 14.** Figure 14: (a) When the angle between the two subspaces is [PITH_FULL_IMAGE:figures/full_fig_p016_14.png] view at source ↗

**Figure 15.** Figure 15: Isometric folding for out-of-union samples. A learnable transform [PITH_FULL_IMAGE:figures/full_fig_p017_15.png] view at source ↗

**Figure 16.** Figure 16: Group-action view of the learned representation: projection onto the canonical manifold union M and its orbit G·M generated by learnable transformations. enumerating infinitely many submanifolds, this perspective shows that they arise from a canonical manifold (or canonical union) through symmetry transformations. Let G be a group of learnable isometries acting on R n. The representation then naturally f… view at source ↗

**Figure 17.** Figure 17: Deep networks as hierarchical manifold generators: successive transformation families [PITH_FULL_IMAGE:figures/full_fig_p019_17.png] view at source ↗

**Figure 18.** Figure 18: PoS module. The module consists of an input transformation (Tin), a projection onto a structured subspace, and an output transformation (Tout), together with a residual branch that captures components not explained by the current subspace. This structure serves as a fundamental building block for hierarchical composition in deep networks. We now introduce the Pursuit–of–Subspaces (PoS) module, illustrat… view at source ↗

**Figure 19.** Figure 19: Zero-shot ECG anomaly detection via PoS. (a) Personalized projection learning on [PITH_FULL_IMAGE:figures/full_fig_p022_19.png] view at source ↗

**Figure 20.** Figure 20: Manifold projection as a prior for 3D microscopy reconstruction, composed of two stages. [PITH_FULL_IMAGE:figures/full_fig_p024_20.png] view at source ↗

**Figure 21.** Figure 21: Qualitative inspection of the domain adaptation technique in volumetric reconstruction. [PITH_FULL_IMAGE:figures/full_fig_p025_21.png] view at source ↗

**Figure 22.** Figure 22: Intersection–residual learning via coupled cross-projections. Left: The architecture details. [PITH_FULL_IMAGE:figures/full_fig_p026_22.png] view at source ↗

**Figure 23.** Figure 23: From residual learning to transformers under the PoS framework. [PITH_FULL_IMAGE:figures/full_fig_p028_23.png] view at source ↗

**Figure 24.** Figure 24: Dual-branch attention (DBA) for subspace selection. Left: One layer of hierarchical [PITH_FULL_IMAGE:figures/full_fig_p030_24.png] view at source ↗

read the original abstract

While deep neural networks have achieved remarkable success across a wide range of domains, their underlying mechanisms remain poorly understood, and they are often regarded as black boxes. This gap between empirical performance and theoretical understanding poses a challenge analogous to the pre-axiomatic stage of classical geometry. In this work, we introduce the Pursuit of Subspaces (PoS) hypothesis, an axiomatic framework that formulates neural network behavior through a set of geometric postulates. These axioms, together with their derived consequences, provide a unified perspective on representation, computation, and generalization in both shallow and deep architectures. We show that this framework yields geometric explanations for fundamental questions in deep learning, including representation structure, architectural mechanisms, and generalization behavior, offering a principled step toward a coherent theoretical foundation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper puts forward a new set of geometric axioms for neural networks under the Pursuit of Subspaces label, but the link from those axioms to actual training dynamics is missing.

read the letter

The main point is that this work proposes the Pursuit of Subspaces hypothesis as a set of geometric postulates meant to explain representation, computation, and generalization in neural nets. The specific framing of networks as subspace pursuit looks new relative to the geometric and manifold ideas already in the literature. They lay out how the axioms could give a single perspective on feature hierarchies, architectural mechanisms, and generalization bounds, which is a reasonable organizing move for a theoretical paper. That part is clear and internally consistent as far as the abstract goes. The stress-test concern holds up: the postulates are presented as primitive without a derivation from SGD, the chain rule, or the geometry of the empirical risk surface. No formal steps, proofs, or checks against observed dynamics appear in the visible material, so the explanatory claims rest on geometric intuition rather than shown consequences. This leaves the circularity risk intact—the axioms seem selected to match known behaviors rather than independently justified and then shown to reproduce them. The paper is aimed at readers who work on foundations of deep learning and want to see new axiomatic proposals. Someone looking for a coherent geometric language to test against data or optimization would find it worth reading, even if they conclude the postulates need more grounding. It shows clear thinking about the gaps in current theory and engages the literature honestly enough to deserve referee time. I would send it to peer review so the authors can address whether the axioms can be derived from or shown equivalent to standard training rather than desk-rejecting it outright.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes the Pursuit of Subspaces (PoS) hypothesis, an axiomatic framework that formulates neural network behavior via a set of geometric postulates. It claims that these axioms and their derived consequences furnish a unified perspective on representation, computation, and generalization for both shallow and deep architectures, while supplying geometric explanations for core questions including representation structure, architectural mechanisms, and generalization behavior.

Significance. If the postulates can be rigorously linked to SGD dynamics and shown to generate non-trivial, falsifiable consequences that match observed network behavior, the framework could supply a coherent theoretical foundation that moves beyond black-box descriptions. The explicit attempt at an axiomatic treatment is a constructive direction for the field.

major comments (2)

[Abstract] Abstract: the assertion that the PoS axioms 'yield geometric explanations' for representation structure, architectural mechanisms, and generalization behavior is unsupported by any derivation steps, formal proofs, or empirical checks within the provided text; the central claim therefore rests on an unshown transition from postulates to concrete predictions.
[§§2–3] §§2–3: the geometric postulates are introduced as primitive without a derivation from gradient-based optimization (chain rule) or the geometry of the empirical risk surface, so it remains unclear whether they explain hierarchical feature learning and generalization or merely re-describe them.

minor comments (1)

[Introduction] The analogy drawn to the pre-axiomatic stage of classical geometry could be sharpened by identifying which specific geometric results (e.g., parallel postulate consequences) are meant to parallel the intended NN theorems.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed report. The comments identify important opportunities to strengthen the clarity and rigor of the axiomatic presentation. We respond to each major comment below and indicate the revisions we will incorporate.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that the PoS axioms 'yield geometric explanations' for representation structure, architectural mechanisms, and generalization behavior is unsupported by any derivation steps, formal proofs, or empirical checks within the provided text; the central claim therefore rests on an unshown transition from postulates to concrete predictions.

Authors: We agree that the abstract statement is stated at a high level of generality. Sections 4 and 5 of the manuscript derive several concrete geometric consequences from the PoS axioms, including the emergence of hierarchical subspace pursuit, a geometric account of skip connections, and a subspace-based generalization bound. To make the transition from postulates to predictions explicit already in the abstract, we will revise the abstract to reference these specific derived results and add forward pointers to the relevant theorems and corollaries. revision: yes
Referee: [§§2–3] §§2–3: the geometric postulates are introduced as primitive without a derivation from gradient-based optimization (chain rule) or the geometry of the empirical risk surface, so it remains unclear whether they explain hierarchical feature learning and generalization or merely re-describe them.

Authors: The PoS hypothesis is formulated as an axiomatic system in which the geometric postulates are taken as primitives, analogous to the role of incidence and congruence axioms in classical geometry. The manuscript motivates these postulates from observed neural-network phenomenology and then derives non-trivial consequences (e.g., progressive subspace alignment and implicit regularization effects) that go beyond re-description. Nevertheless, we acknowledge the value of an explicit link to SGD dynamics. We will add a new subsection in Section 3 that sketches how the postulates can arise as effective descriptions of gradient flow on the empirical risk surface, drawing on existing results on the geometry of overparameterized loss landscapes, while preserving the axiomatic character of the framework. revision: partial

Circularity Check

0 steps flagged

Axiomatic postulates presented as primitives; derivations self-contained

full rationale

The manuscript introduces the PoS hypothesis explicitly as a set of geometric postulates chosen to formulate observed neural network behavior, then derives consequences for representation, computation, and generalization. No quoted equations or sections reduce any claimed prediction or explanation back to the postulates by construction (e.g., no fitted parameters renamed as predictions, no self-citation chain supplying the load-bearing uniqueness, and no ansatz smuggled via prior work). The framework is therefore self-contained as an axiomatic starting point rather than a tautological re-description of its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper's central claim rests on a new set of geometric postulates introduced without derivation from prior theory or data; these postulates function as the primary axioms. No free parameters or invented entities are explicitly named in the abstract, but the framework itself constitutes an ad-hoc axiomatic layer placed on top of existing network observations.

axioms (1)

ad hoc to paper Neural network behavior can be formulated through a set of geometric postulates concerning pursuit of subspaces.
Stated in abstract as the foundational hypothesis; no prior justification or external derivation is supplied.

pith-pipeline@v0.9.0 · 5683 in / 1319 out tokens · 33539 ms · 2026-05-21T06:48:43.575166+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean, IndisputableMonolith/Cost/FunctionalEquation.lean reality_from_one_distinction, washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce the Pursuit of Subspaces (PoS) hypothesis, an axiomatic framework that formulates neural network behavior through a set of geometric postulates... Postulate 1 (Compactness of Data Representation)... Postulate 2 (Nonlinear Orthogonal Projection onto Submanifolds)... Postulate 3 (Orthogonal Complements via Residual Connection)... Postulate 4 (Recursive Application of Nonlinear Projections)
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 1 (Fundamental Theorem of Deep Learning)... NDNN ∼ Cϵ(M) + Σ Cϵ(Gℓ)Cϵ(MDi) (additive scaling)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · 8 internal anchors

[1]

Operational sup- port estimator networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):8442–8458, 2024

Mete Ahishali, Mehmet Yamac, Serkan Kiranyaz, and Moncef Gabbouj. Operational sup- port estimator networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):8442–8458, 2024

work page 2024
[2]

A spline theory of deep learning

Randall Balestriero et al. A spline theory of deep learning. InInternational Conference on Machine Learning, pages 374–383. PMLR, 2018

work page 2018
[3]

Learning by reconstruction produces uninformative features for perception.arXiv preprint arXiv:2402.11337, 2024

Randall Balestriero and Yann LeCun. Learning by reconstruction produces uninformative features for perception.arXiv preprint arXiv:2402.11337, 2024

work page arXiv 2024
[4]

Reconciling modern machine- learning practice and the classical bias–variance trade-off.Proceedings of the National Academy of Sciences, 116(32):15849–15854, 2019

Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine- learning practice and the classical bias–variance trade-off.Proceedings of the National Academy of Sciences, 116(32):15849–15854, 2019

work page 2019
[5]

Sampling theorems for signals from the union of finite- dimensional linear subspaces.IEEE Transactions on Information Theory, 55(4):1872–1882, 2009

Thomas Blumensath and Mike E Davies. Sampling theorems for signals from the union of finite- dimensional linear subspaces.IEEE Transactions on Information Theory, 55(4):1872–1882, 2009

work page 2009
[6]

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

Michael M. Bronstein, Joan Bruna, Taco Cohen, and Petar Veli ˇckovi´c. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges.arXiv preprint arXiv:2104.13478, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[7]

The restricted isometry property and its implications for compressed sensing.Comptes rendus mathematique, 346(9-10):589–592, 2008

Emmanuel J Candes. The restricted isometry property and its implications for compressed sensing.Comptes rendus mathematique, 346(9-10):589–592, 2008

work page 2008
[8]

Compressive sampling

Emmanuel J Candès et al. Compressive sampling. InProceedings of the International Congress of Mathematicians, volume 3, pages 1433–1452, 2006

work page 2006
[9]

Decoding by linear programming.IEEE transactions on information theory, 51(12):4203–4215, 2005

Emmanuel J Candes and Terence Tao. Decoding by linear programming.IEEE transactions on information theory, 51(12):4203–4215, 2005

work page 2005
[10]

Ecg monitoring in wearable devices by sparse models

Diego Carrera, Beatrice Rossi, Daniele Zambon, Pasqualina Fragneto, and Giacomo Boracchi. Ecg monitoring in wearable devices by sparse models. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 145–160. Springer, 2016

work page 2016
[11]

Variational Lossy Autoencoder

Xi Chen, Diederik P Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, and Pieter Abbeel. Variational lossy autoencoder.arXiv preprint arXiv:1611.02731, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[12]

Neural population geometry: An approach for understand- ing biological and artificial neural networks.Current opinion in neurobiology, 70:137–144, 2021

SueYeon Chung and Larry F Abbott. Neural population geometry: An approach for understand- ing biological and artificial neural networks.Current opinion in neurobiology, 70:137–144, 2021

work page 2021
[13]

Certified adversarial robustness via randomized smoothing

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. Ininternational conference on machine learning, pages 1310–1320. PMLR, 2019

work page 2019
[14]

Gauge equivariant convolutional networks and the icosahedral cnn

Taco Cohen, Maurice Weiler, Berkay Kicanaoglu, and Max Welling. Gauge equivariant convolutional networks and the icosahedral cnn. InInternational conference on Machine learning, pages 1321–1330. PMLR, 2019

work page 2019
[15]

Group equivariant convolutional networks

Taco Cohen and Max Welling. Group equivariant convolutional networks. InInternational conference on machine learning, pages 2990–2999. PMLR, 2016

work page 2016
[16]

Separability and geometry of object manifolds in deep neural networks.Nature communications, 11(1):746, 2020

Uri Cohen, SueYeon Chung, Daniel D Lee, and Haim Sompolinsky. Separability and geometry of object manifolds in deep neural networks.Nature communications, 11(1):746, 2020. 34

work page 2020
[17]

Randaugment: Practical automated data augmentation with a reduced search space

Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical automated data augmentation with a reduced search space. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 702–703, 2020

work page 2020
[18]

Compressed sensing.IEEE Transactions on information theory, 52(4):1289–1306, 2006

David L Donoho et al. Compressed sensing.IEEE Transactions on information theory, 52(4):1289–1306, 2006

work page 2006
[19]

Nonlinear orthogonal projection

Ewa Dudek and Konstanty Holly. Nonlinear orthogonal projection. InAnnales Polonici Mathematici, volume 59, pages 1–31. Polska Akademia Nauk. Instytut Matematyczny PAN, 1994

work page 1994
[20]

Cambridge University Press, 2018

Bjørn Ian Dundas.A short course in differential topology. Cambridge University Press, 2018

work page 2018
[21]

Recommended practice for testing and reporting performance results of ventricular arrhythmia detection algorithms.Arlington, VA, 1987

Association for the Advancement of Medical Instrumentation. Recommended practice for testing and reporting performance results of ventricular arrhythmia detection algorithms.Arlington, VA, 1987

work page 1987
[22]

A theory of cortical responses.Philosophical transactions of the Royal Society B: Biological sciences, 360(1456):815–836, 2005

Karl Friston. A theory of cortical responses.Philosophical transactions of the Royal Society B: Biological sciences, 360(1456):815–836, 2005

work page 2005
[23]

The free-energy principle: a unified brain theory?Nature reviews neuroscience, 11(2):127–138, 2010

Karl Friston. The free-energy principle: a unified brain theory?Nature reviews neuroscience, 11(2):127–138, 2010

work page 2010
[24]

Gabbouj, S

M. Gabbouj, S. Kiranyaz, J. Malik, M. U. Zahid, T. Ince, M. E. H. Chowdhury, A. Khandakar, and A. Tahir. Robust Peak Detection for Holter ECGs by Self-Organized Operational Neural Networks.IEEE Trans Neural Netw Learn Syst, PP, Mar 2022

work page 2022
[25]

Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals.Circulation, 101(23):e215–e220, 2000

Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals.Circulation, 101(23):e215–e220, 2000

work page 2000
[26]

Towards trustworthy deep learning for image reconstruction

Alexis Marie Frederic Goujon. Towards trustworthy deep learning for image reconstruction. Technical report, EPFL, 2024

work page 2024
[27]

American Mathematical Society, 2025

Victor Guillemin and Alan Pollack.Differential topology, volume 370. American Mathematical Society, 2025

work page 2025
[28]

Fourier light-field microscopy.Optics express, 27(18):25573–25594, 2019

Changliang Guo, Wenhao Liu, Xuanwen Hua, Haoyu Li, and Shu Jia. Fourier light-field microscopy.Optics express, 27(18):25573–25594, 2019

work page 2019
[29]

Principles of riemannian geometry in neural networks.Advances in neural information processing systems, 30, 2017

Michael Hauser and Asok Ray. Principles of riemannian geometry in neural networks.Advances in neural information processing systems, 30, 2017

work page 2017
[31]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022

work page 2022
[32]

Analysis of a complex of statistical variables into principal components

Harold Hotelling. Analysis of a complex of statistical variables into principal components. Journal of educational psychology, 24(6):417, 1933

work page 1933
[33]

Splinecam: Exact visualization and characterization of deep network geometry and decision boundaries

Ahmed Imtiaz Humayun, Randall Balestriero, Guha Balakrishnan, and Richard G Baraniuk. Splinecam: Exact visualization and characterization of deep network geometry and decision boundaries. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3789–3798, 2023

work page 2023
[34]

Geometric manifold learning.IEEE Signal Processing Magazine, 28(2):69–76, 2011

Arta A Jamshidi, Michael J Kirby, and Dave S Broomhead. Geometric manifold learning.IEEE Signal Processing Magazine, 28(2):69–76, 2011

work page 2011
[35]

Extensions of Lipschitz Mappings Into a Hilbert Space.Contemporary mathematics, 26(189-206):1, 1984

William B Johnson and Joram Lindenstrauss. Extensions of Lipschitz Mappings Into a Hilbert Space.Contemporary mathematics, 26(189-206):1, 1984

work page 1984
[36]

Transformers are rnns: Fast autoregressive transformers with linear attention

Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. Transformers are rnns: Fast autoregressive transformers with linear attention. InInternational conference on machine learning, pages 5156–5165. PMLR, 2020

work page 2020
[37]

Real-time patient-specific ecg classifi- cation by 1-d convolutional neural networks.IEEE Transactions on Biomedical Engineering, 63(3):664–675, 2016

Serkan Kiranyaz, Turker Ince, and Moncef Gabbouj. Real-time patient-specific ecg classifi- cation by 1-d convolutional neural networks.IEEE Transactions on Biomedical Engineering, 63(3):664–675, 2016. 35

work page 2016
[38]

Personalized monitoring and advance warning system for cardiac arrhythmias.Scientific Reports, 7(1):9270, 2017

Serkan Kiranyaz, Turker Ince, and Moncef Gabbouj. Personalized monitoring and advance warning system for cardiac arrhythmias.Scientific Reports, 7(1):9270, 2017

work page 2017
[39]

On the generalization of equivariance and convolution in neural networks to the action of compact groups

Risi Kondor and Shubhendu Trivedi. On the generalization of equivariance and convolution in neural networks to the action of compact groups. InInternational conference on machine learning, pages 2747–2755. PMLR, 2018

work page 2018
[40]

Masked autoencoders for microscopy are scalable learners of cellular biology

Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, et al. Masked autoencoders for microscopy are scalable learners of cellular biology. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11757–11768, 2024

work page 2024
[41]

Neural tuning and representational geometry.Nature Reviews Neuroscience, 22(11):703–718, 2021

Nikolaus Kriegeskorte and Xue-Xin Wei. Neural tuning and representational geometry.Nature Reviews Neuroscience, 22(11):703–718, 2021

work page 2021
[42]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009
[43]

Springer, 2000

John M Lee.Introduction to topological manifolds. Springer, 2000

work page 2000
[44]

Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group

Mario Lezcano-Casado and David Martınez-Rubio. Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group. InInternational Conference on Machine Learning, pages 3794–3803. PMLR, 2019

work page 2019
[45]

Detection of ecg characteristic points using wavelet transforms.IEEE Transactions on biomedical Engineering, 42(1):21–28, 1995

Cuiwei Li, Chongxun Zheng, and Changfeng Tai. Detection of ecg characteristic points using wavelet transforms.IEEE Transactions on biomedical Engineering, 42(1):21–28, 1995

work page 1995
[46]

Orthogonal deep neural networks.IEEE transactions on pattern analysis and machine intelligence, 43(4):1352–1368, 2019

Shuai Li, Kui Jia, Yuxin Wen, Tongliang Liu, and Dacheng Tao. Orthogonal deep neural networks.IEEE transactions on pattern analysis and machine intelligence, 43(4):1352–1368, 2019

work page 2019
[47]

Towards robust neural networks via random self-ensemble

Xuanqing Liu, Minhao Cheng, Huan Zhang, and Cho-Jui Hsieh. Towards robust neural networks via random self-ensemble. InProceedings of the european conference on computer vision (ECCV), pages 369–385, 2018

work page 2018
[48]

SGDR: Stochastic Gradient Descent with Warm Restarts

Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[49]

A theory for sampling signals from a union of subspaces.IEEE transactions on signal processing, 56(6):2334–2345, 2008

Yue M Lu and Minh N Do. A theory for sampling signals from a union of subspaces.IEEE transactions on signal processing, 56(6):2334–2345, 2008

work page 2008
[50]

Ecg databases for biometric systems: A systematic review.Expert Systems with Applications, 67:189–202, 2017

Mario Merone, Paolo Soda, Mario Sansone, and Carlo Sansone. Ecg databases for biometric systems: A systematic review.Expert Systems with Applications, 67:189–202, 2017

work page 2017
[51]

On the number of linear regions of deep neural networks.Advances in neural information processing systems, 27, 2014

Guido Montúfar, Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio. On the number of linear regions of deep neural networks.Advances in neural information processing systems, 27, 2014

work page 2014
[52]

The impact of the mit-bih arrhythmia database.IEEE Engineering in Medicine and Biology Magazine, 20(3):45–50, 2001

George B Moody and Roger G Mark. The impact of the mit-bih arrhythmia database.IEEE Engineering in Medicine and Biology Magazine, 20(3):45–50, 2001

work page 2001
[53]

CRC press, 2018

Mikio Nakahara.Geometry, topology and physics. CRC press, 2018

work page 2018
[54]

Sample complexity of testing the manifold hypothesis

Hariharan Narayanan and Sanjoy Mitter. Sample complexity of testing the manifold hypothesis. Advances in neural information processing systems, 23, 2010

work page 2010
[55]

Adding Gradient Noise Improves Learning for Very Deep Networks

Arvind Neelakantan, Luke Vilnis, Quoc V Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, and James Martens. Adding gradient noise improves learning for very deep networks.arXiv preprint arXiv:1511.06807, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[56]

Finding the homology of submanifolds with high confidence from random samples.Discrete & Computational Geometry, 39(1):419– 441, 2008

Partha Niyogi, Stephen Smale, and Shmuel Weinberger. Finding the homology of submanifolds with high confidence from random samples.Discrete & Computational Geometry, 39(1):419– 441, 2008

work page 2008
[57]

Mortal computation: A foundation for biomimetic intelligence.arXiv preprint arXiv:2311.09589, 2023

Alexander Ororbia and Karl Friston. Mortal computation: A foundation for biomimetic intelligence.arXiv preprint arXiv:2311.09589, 2023

work page arXiv 2023
[58]

A real-time qrs detection algorithm.IEEE transactions on biomedical engineering, (3):230–236, 1985

Jiapu Pan and Willis J Tompkins. A real-time qrs detection algorithm.IEEE transactions on biomedical engineering, (3):230–236, 1985

work page 1985
[59]

On the number of response regions of deep feed forward networks with piece-wise linear activations

Razvan Pascanu, Guido Montufar, and Yoshua Bengio. On the number of response regions of deep feed forward networks with piece-wise linear activations.arXiv preprint arXiv:1312.6098, 2013. 36

work page internal anchor Pith review Pith/arXiv arXiv 2013
[60]

A neural manifold view of the brain

Matthew G Perich, Devika Narain, and Juan A Gallego. A neural manifold view of the brain. Nature Neuroscience, 28(8):1582–1597, 2025

work page 2025
[61]

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant Misra. Grokking: Gen- eralization beyond overfitting on small algorithmic datasets.arXiv preprint arXiv:2201.02177, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[62]

Prince.Understanding Deep Learning

Simon J.D. Prince.Understanding Deep Learning. The MIT Press, 2023

work page 2023
[63]

On the expressive power of deep neural networks

Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, and Jascha Sohl-Dickstein. On the expressive power of deep neural networks. Ininternational conference on machine learning, pages 2847–2854. PMLR, 2017

work page 2017
[64]

Efficient learning of sparse representations with an energy-based model.Advances in neural information processing systems, 19, 2006

Marc’Aurelio Ranzato, Christopher Poultney, Sumit Chopra, and Yann Cun. Efficient learning of sparse representations with an energy-based model.Advances in neural information processing systems, 19, 2006

work page 2006
[65]

The manifold tangent classifier.Advances in neural information processing systems, 24, 2011

Salah Rifai, Yann N Dauphin, Pascal Vincent, Yoshua Bengio, and Xavier Muller. The manifold tangent classifier.Advances in neural information processing systems, 24, 2011

work page 2011
[66]

The unreasonable effectiveness of deep learning in artificial intelligence

Terrence J Sejnowski. The unreasonable effectiveness of deep learning in artificial intelligence. Proceedings of the National Academy of Sciences, 117(48):30033–30038, 2020

work page 2020
[67]

Bounding and counting linear regions of deep neural networks

Thiago Serra, Christian Tjandraatmadja, and Srikumar Ramalingam. Bounding and counting linear regions of deep neural networks. InInternational conference on machine learning, pages 4558–4566. PMLR, 2018

work page 2018
[68]

Dropout: a simple way to prevent neural networks from overfitting.The journal of machine learning research, 15(1):1929–1958, 2014

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting.The journal of machine learning research, 15(1):1929–1958, 2014

work page 1929
[69]

Tu.An Introduction to Manifolds

L.W. Tu.An Introduction to Manifolds. Universitext. Springer New York, 2010

work page 2010
[70]

Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.Journal of machine learning research, 11(12), 2010

Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol, and Léon Bottou. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.Journal of machine learning research, 11(12), 2010

work page 2010
[71]

Cvt: Introducing convolutions to vision transformers

Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, and Lei Zhang. Cvt: Introducing convolutions to vision transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 22–31, 2021

work page 2021
[72]

Masked frequency modeling for self-supervised visual pre-training

Jiahao Xie, Wei Li, Xiaohang Zhan, Ziwei Liu, Yew-Soon Ong, and Chen Change Loy. Masked frequency modeling for self-supervised visual pre-training. InThe Eleventh International Conference on Learning Representations

work page
[73]

Mehmet Yamaç, Mete Ahishali, Serkan Kiranyaz, and Moncef Gabbouj. Convolutional sparse support estimator network (csen): From energy-efficient support estimation to learning-aided compressive sensing.IEEE Transactions on Neural Networks and Learning Systems, 34(1):290– 304, 2021

work page 2021
[74]

Mehmet Yamaç, Mert Duman, ˙Ilke Adalıo˘glu, Serkan Kiranyaz, and Moncef Gabbouj. A personalized zero-shot ecg arrhythmia monitoring system: From sparse representation based domain adaption to energy efficient abnormal beat detection for practical ecg surveillance.arXiv preprint arXiv:2207.07089, 2022

work page arXiv 2022
[75]

Video-rate 3d imaging of living cells using fourier view-channel-depth light field microscopy.Communications biology, 6(1):1259, 2023

Chengqiang Yi, Lanxin Zhu, Jiahao Sun, Zhaofei Wang, Meng Zhang, Fenghe Zhong, Luxin Yan, Jiang Tang, Liang Huang, Yu-Hui Zhang, et al. Video-rate 3d imaging of living cells using fourier view-channel-depth light field microscopy.Communications biology, 6(1):1259, 2023

work page 2023
[76]

Cutmix: Regularization strategy to train strong classifiers with localizable features

Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6023–6032, 2019

work page 2019
[77]

Understanding deep learning requires rethinking generalization

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization.arXiv preprint arXiv:1611.03530, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[78]

mixup: Beyond Empirical Risk Minimization

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization.arXiv preprint arXiv:1710.09412, 2017. 37 A Notation In this work, we consider the ℓp–norm of a vector x∈R n, defined by ∥x∥p = (Pn i=1 |xi|p)1/p with p≥1 . The ℓ0 “norm” is given by ∥x∥0 = limp→0 Pn i=1 |xi|p, which counts the number of nonzero e...

work page internal anchor Pith review Pith/arXiv arXiv 2017

[1] [1]

Operational sup- port estimator networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):8442–8458, 2024

Mete Ahishali, Mehmet Yamac, Serkan Kiranyaz, and Moncef Gabbouj. Operational sup- port estimator networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):8442–8458, 2024

work page 2024

[2] [2]

A spline theory of deep learning

Randall Balestriero et al. A spline theory of deep learning. InInternational Conference on Machine Learning, pages 374–383. PMLR, 2018

work page 2018

[3] [3]

Learning by reconstruction produces uninformative features for perception.arXiv preprint arXiv:2402.11337, 2024

Randall Balestriero and Yann LeCun. Learning by reconstruction produces uninformative features for perception.arXiv preprint arXiv:2402.11337, 2024

work page arXiv 2024

[4] [4]

Reconciling modern machine- learning practice and the classical bias–variance trade-off.Proceedings of the National Academy of Sciences, 116(32):15849–15854, 2019

Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine- learning practice and the classical bias–variance trade-off.Proceedings of the National Academy of Sciences, 116(32):15849–15854, 2019

work page 2019

[5] [5]

Sampling theorems for signals from the union of finite- dimensional linear subspaces.IEEE Transactions on Information Theory, 55(4):1872–1882, 2009

Thomas Blumensath and Mike E Davies. Sampling theorems for signals from the union of finite- dimensional linear subspaces.IEEE Transactions on Information Theory, 55(4):1872–1882, 2009

work page 2009

[6] [6]

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

Michael M. Bronstein, Joan Bruna, Taco Cohen, and Petar Veli ˇckovi´c. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges.arXiv preprint arXiv:2104.13478, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[7] [7]

The restricted isometry property and its implications for compressed sensing.Comptes rendus mathematique, 346(9-10):589–592, 2008

Emmanuel J Candes. The restricted isometry property and its implications for compressed sensing.Comptes rendus mathematique, 346(9-10):589–592, 2008

work page 2008

[8] [8]

Compressive sampling

Emmanuel J Candès et al. Compressive sampling. InProceedings of the International Congress of Mathematicians, volume 3, pages 1433–1452, 2006

work page 2006

[9] [9]

Decoding by linear programming.IEEE transactions on information theory, 51(12):4203–4215, 2005

Emmanuel J Candes and Terence Tao. Decoding by linear programming.IEEE transactions on information theory, 51(12):4203–4215, 2005

work page 2005

[10] [10]

Ecg monitoring in wearable devices by sparse models

Diego Carrera, Beatrice Rossi, Daniele Zambon, Pasqualina Fragneto, and Giacomo Boracchi. Ecg monitoring in wearable devices by sparse models. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 145–160. Springer, 2016

work page 2016

[11] [11]

Variational Lossy Autoencoder

Xi Chen, Diederik P Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, and Pieter Abbeel. Variational lossy autoencoder.arXiv preprint arXiv:1611.02731, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[12] [12]

Neural population geometry: An approach for understand- ing biological and artificial neural networks.Current opinion in neurobiology, 70:137–144, 2021

SueYeon Chung and Larry F Abbott. Neural population geometry: An approach for understand- ing biological and artificial neural networks.Current opinion in neurobiology, 70:137–144, 2021

work page 2021

[13] [13]

Certified adversarial robustness via randomized smoothing

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. Ininternational conference on machine learning, pages 1310–1320. PMLR, 2019

work page 2019

[14] [14]

Gauge equivariant convolutional networks and the icosahedral cnn

Taco Cohen, Maurice Weiler, Berkay Kicanaoglu, and Max Welling. Gauge equivariant convolutional networks and the icosahedral cnn. InInternational conference on Machine learning, pages 1321–1330. PMLR, 2019

work page 2019

[15] [15]

Group equivariant convolutional networks

Taco Cohen and Max Welling. Group equivariant convolutional networks. InInternational conference on machine learning, pages 2990–2999. PMLR, 2016

work page 2016

[16] [16]

Separability and geometry of object manifolds in deep neural networks.Nature communications, 11(1):746, 2020

Uri Cohen, SueYeon Chung, Daniel D Lee, and Haim Sompolinsky. Separability and geometry of object manifolds in deep neural networks.Nature communications, 11(1):746, 2020. 34

work page 2020

[17] [17]

Randaugment: Practical automated data augmentation with a reduced search space

Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical automated data augmentation with a reduced search space. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 702–703, 2020

work page 2020

[18] [18]

Compressed sensing.IEEE Transactions on information theory, 52(4):1289–1306, 2006

David L Donoho et al. Compressed sensing.IEEE Transactions on information theory, 52(4):1289–1306, 2006

work page 2006

[19] [19]

Nonlinear orthogonal projection

Ewa Dudek and Konstanty Holly. Nonlinear orthogonal projection. InAnnales Polonici Mathematici, volume 59, pages 1–31. Polska Akademia Nauk. Instytut Matematyczny PAN, 1994

work page 1994

[20] [20]

Cambridge University Press, 2018

Bjørn Ian Dundas.A short course in differential topology. Cambridge University Press, 2018

work page 2018

[21] [21]

Recommended practice for testing and reporting performance results of ventricular arrhythmia detection algorithms.Arlington, VA, 1987

Association for the Advancement of Medical Instrumentation. Recommended practice for testing and reporting performance results of ventricular arrhythmia detection algorithms.Arlington, VA, 1987

work page 1987

[22] [22]

A theory of cortical responses.Philosophical transactions of the Royal Society B: Biological sciences, 360(1456):815–836, 2005

Karl Friston. A theory of cortical responses.Philosophical transactions of the Royal Society B: Biological sciences, 360(1456):815–836, 2005

work page 2005

[23] [23]

The free-energy principle: a unified brain theory?Nature reviews neuroscience, 11(2):127–138, 2010

Karl Friston. The free-energy principle: a unified brain theory?Nature reviews neuroscience, 11(2):127–138, 2010

work page 2010

[24] [24]

Gabbouj, S

M. Gabbouj, S. Kiranyaz, J. Malik, M. U. Zahid, T. Ince, M. E. H. Chowdhury, A. Khandakar, and A. Tahir. Robust Peak Detection for Holter ECGs by Self-Organized Operational Neural Networks.IEEE Trans Neural Netw Learn Syst, PP, Mar 2022

work page 2022

[25] [25]

Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals.Circulation, 101(23):e215–e220, 2000

Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals.Circulation, 101(23):e215–e220, 2000

work page 2000

[26] [26]

Towards trustworthy deep learning for image reconstruction

Alexis Marie Frederic Goujon. Towards trustworthy deep learning for image reconstruction. Technical report, EPFL, 2024

work page 2024

[27] [27]

American Mathematical Society, 2025

Victor Guillemin and Alan Pollack.Differential topology, volume 370. American Mathematical Society, 2025

work page 2025

[28] [28]

Fourier light-field microscopy.Optics express, 27(18):25573–25594, 2019

Changliang Guo, Wenhao Liu, Xuanwen Hua, Haoyu Li, and Shu Jia. Fourier light-field microscopy.Optics express, 27(18):25573–25594, 2019

work page 2019

[29] [29]

Principles of riemannian geometry in neural networks.Advances in neural information processing systems, 30, 2017

Michael Hauser and Asok Ray. Principles of riemannian geometry in neural networks.Advances in neural information processing systems, 30, 2017

work page 2017

[30] [31]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022

work page 2022

[31] [32]

Analysis of a complex of statistical variables into principal components

Harold Hotelling. Analysis of a complex of statistical variables into principal components. Journal of educational psychology, 24(6):417, 1933

work page 1933

[32] [33]

Splinecam: Exact visualization and characterization of deep network geometry and decision boundaries

Ahmed Imtiaz Humayun, Randall Balestriero, Guha Balakrishnan, and Richard G Baraniuk. Splinecam: Exact visualization and characterization of deep network geometry and decision boundaries. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3789–3798, 2023

work page 2023

[33] [34]

Geometric manifold learning.IEEE Signal Processing Magazine, 28(2):69–76, 2011

Arta A Jamshidi, Michael J Kirby, and Dave S Broomhead. Geometric manifold learning.IEEE Signal Processing Magazine, 28(2):69–76, 2011

work page 2011

[34] [35]

Extensions of Lipschitz Mappings Into a Hilbert Space.Contemporary mathematics, 26(189-206):1, 1984

William B Johnson and Joram Lindenstrauss. Extensions of Lipschitz Mappings Into a Hilbert Space.Contemporary mathematics, 26(189-206):1, 1984

work page 1984

[35] [36]

Transformers are rnns: Fast autoregressive transformers with linear attention

Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. Transformers are rnns: Fast autoregressive transformers with linear attention. InInternational conference on machine learning, pages 5156–5165. PMLR, 2020

work page 2020

[36] [37]

Real-time patient-specific ecg classifi- cation by 1-d convolutional neural networks.IEEE Transactions on Biomedical Engineering, 63(3):664–675, 2016

Serkan Kiranyaz, Turker Ince, and Moncef Gabbouj. Real-time patient-specific ecg classifi- cation by 1-d convolutional neural networks.IEEE Transactions on Biomedical Engineering, 63(3):664–675, 2016. 35

work page 2016

[37] [38]

Personalized monitoring and advance warning system for cardiac arrhythmias.Scientific Reports, 7(1):9270, 2017

Serkan Kiranyaz, Turker Ince, and Moncef Gabbouj. Personalized monitoring and advance warning system for cardiac arrhythmias.Scientific Reports, 7(1):9270, 2017

work page 2017

[38] [39]

On the generalization of equivariance and convolution in neural networks to the action of compact groups

Risi Kondor and Shubhendu Trivedi. On the generalization of equivariance and convolution in neural networks to the action of compact groups. InInternational conference on machine learning, pages 2747–2755. PMLR, 2018

work page 2018

[39] [40]

Masked autoencoders for microscopy are scalable learners of cellular biology

Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, et al. Masked autoencoders for microscopy are scalable learners of cellular biology. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11757–11768, 2024

work page 2024

[40] [41]

Neural tuning and representational geometry.Nature Reviews Neuroscience, 22(11):703–718, 2021

Nikolaus Kriegeskorte and Xue-Xin Wei. Neural tuning and representational geometry.Nature Reviews Neuroscience, 22(11):703–718, 2021

work page 2021

[41] [42]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009

[42] [43]

Springer, 2000

John M Lee.Introduction to topological manifolds. Springer, 2000

work page 2000

[43] [44]

Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group

Mario Lezcano-Casado and David Martınez-Rubio. Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group. InInternational Conference on Machine Learning, pages 3794–3803. PMLR, 2019

work page 2019

[44] [45]

Detection of ecg characteristic points using wavelet transforms.IEEE Transactions on biomedical Engineering, 42(1):21–28, 1995

Cuiwei Li, Chongxun Zheng, and Changfeng Tai. Detection of ecg characteristic points using wavelet transforms.IEEE Transactions on biomedical Engineering, 42(1):21–28, 1995

work page 1995

[45] [46]

Orthogonal deep neural networks.IEEE transactions on pattern analysis and machine intelligence, 43(4):1352–1368, 2019

Shuai Li, Kui Jia, Yuxin Wen, Tongliang Liu, and Dacheng Tao. Orthogonal deep neural networks.IEEE transactions on pattern analysis and machine intelligence, 43(4):1352–1368, 2019

work page 2019

[46] [47]

Towards robust neural networks via random self-ensemble

Xuanqing Liu, Minhao Cheng, Huan Zhang, and Cho-Jui Hsieh. Towards robust neural networks via random self-ensemble. InProceedings of the european conference on computer vision (ECCV), pages 369–385, 2018

work page 2018

[47] [48]

SGDR: Stochastic Gradient Descent with Warm Restarts

Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[48] [49]

A theory for sampling signals from a union of subspaces.IEEE transactions on signal processing, 56(6):2334–2345, 2008

Yue M Lu and Minh N Do. A theory for sampling signals from a union of subspaces.IEEE transactions on signal processing, 56(6):2334–2345, 2008

work page 2008

[49] [50]

Ecg databases for biometric systems: A systematic review.Expert Systems with Applications, 67:189–202, 2017

Mario Merone, Paolo Soda, Mario Sansone, and Carlo Sansone. Ecg databases for biometric systems: A systematic review.Expert Systems with Applications, 67:189–202, 2017

work page 2017

[50] [51]

On the number of linear regions of deep neural networks.Advances in neural information processing systems, 27, 2014

Guido Montúfar, Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio. On the number of linear regions of deep neural networks.Advances in neural information processing systems, 27, 2014

work page 2014

[51] [52]

The impact of the mit-bih arrhythmia database.IEEE Engineering in Medicine and Biology Magazine, 20(3):45–50, 2001

George B Moody and Roger G Mark. The impact of the mit-bih arrhythmia database.IEEE Engineering in Medicine and Biology Magazine, 20(3):45–50, 2001

work page 2001

[52] [53]

CRC press, 2018

Mikio Nakahara.Geometry, topology and physics. CRC press, 2018

work page 2018

[53] [54]

Sample complexity of testing the manifold hypothesis

Hariharan Narayanan and Sanjoy Mitter. Sample complexity of testing the manifold hypothesis. Advances in neural information processing systems, 23, 2010

work page 2010

[54] [55]

Adding Gradient Noise Improves Learning for Very Deep Networks

Arvind Neelakantan, Luke Vilnis, Quoc V Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, and James Martens. Adding gradient noise improves learning for very deep networks.arXiv preprint arXiv:1511.06807, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[55] [56]

Finding the homology of submanifolds with high confidence from random samples.Discrete & Computational Geometry, 39(1):419– 441, 2008

Partha Niyogi, Stephen Smale, and Shmuel Weinberger. Finding the homology of submanifolds with high confidence from random samples.Discrete & Computational Geometry, 39(1):419– 441, 2008

work page 2008

[56] [57]

Mortal computation: A foundation for biomimetic intelligence.arXiv preprint arXiv:2311.09589, 2023

Alexander Ororbia and Karl Friston. Mortal computation: A foundation for biomimetic intelligence.arXiv preprint arXiv:2311.09589, 2023

work page arXiv 2023

[57] [58]

A real-time qrs detection algorithm.IEEE transactions on biomedical engineering, (3):230–236, 1985

Jiapu Pan and Willis J Tompkins. A real-time qrs detection algorithm.IEEE transactions on biomedical engineering, (3):230–236, 1985

work page 1985

[58] [59]

On the number of response regions of deep feed forward networks with piece-wise linear activations

Razvan Pascanu, Guido Montufar, and Yoshua Bengio. On the number of response regions of deep feed forward networks with piece-wise linear activations.arXiv preprint arXiv:1312.6098, 2013. 36

work page internal anchor Pith review Pith/arXiv arXiv 2013

[59] [60]

A neural manifold view of the brain

Matthew G Perich, Devika Narain, and Juan A Gallego. A neural manifold view of the brain. Nature Neuroscience, 28(8):1582–1597, 2025

work page 2025

[60] [61]

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant Misra. Grokking: Gen- eralization beyond overfitting on small algorithmic datasets.arXiv preprint arXiv:2201.02177, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[61] [62]

Prince.Understanding Deep Learning

Simon J.D. Prince.Understanding Deep Learning. The MIT Press, 2023

work page 2023

[62] [63]

On the expressive power of deep neural networks

Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, and Jascha Sohl-Dickstein. On the expressive power of deep neural networks. Ininternational conference on machine learning, pages 2847–2854. PMLR, 2017

work page 2017

[63] [64]

Efficient learning of sparse representations with an energy-based model.Advances in neural information processing systems, 19, 2006

Marc’Aurelio Ranzato, Christopher Poultney, Sumit Chopra, and Yann Cun. Efficient learning of sparse representations with an energy-based model.Advances in neural information processing systems, 19, 2006

work page 2006

[64] [65]

The manifold tangent classifier.Advances in neural information processing systems, 24, 2011

Salah Rifai, Yann N Dauphin, Pascal Vincent, Yoshua Bengio, and Xavier Muller. The manifold tangent classifier.Advances in neural information processing systems, 24, 2011

work page 2011

[65] [66]

The unreasonable effectiveness of deep learning in artificial intelligence

Terrence J Sejnowski. The unreasonable effectiveness of deep learning in artificial intelligence. Proceedings of the National Academy of Sciences, 117(48):30033–30038, 2020

work page 2020

[66] [67]

Bounding and counting linear regions of deep neural networks

Thiago Serra, Christian Tjandraatmadja, and Srikumar Ramalingam. Bounding and counting linear regions of deep neural networks. InInternational conference on machine learning, pages 4558–4566. PMLR, 2018

work page 2018

[67] [68]

Dropout: a simple way to prevent neural networks from overfitting.The journal of machine learning research, 15(1):1929–1958, 2014

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting.The journal of machine learning research, 15(1):1929–1958, 2014

work page 1929

[68] [69]

Tu.An Introduction to Manifolds

L.W. Tu.An Introduction to Manifolds. Universitext. Springer New York, 2010

work page 2010

[69] [70]

Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.Journal of machine learning research, 11(12), 2010

Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol, and Léon Bottou. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.Journal of machine learning research, 11(12), 2010

work page 2010

[70] [71]

Cvt: Introducing convolutions to vision transformers

Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, and Lei Zhang. Cvt: Introducing convolutions to vision transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 22–31, 2021

work page 2021

[71] [72]

Masked frequency modeling for self-supervised visual pre-training

Jiahao Xie, Wei Li, Xiaohang Zhan, Ziwei Liu, Yew-Soon Ong, and Chen Change Loy. Masked frequency modeling for self-supervised visual pre-training. InThe Eleventh International Conference on Learning Representations

work page

[72] [73]

Mehmet Yamaç, Mete Ahishali, Serkan Kiranyaz, and Moncef Gabbouj. Convolutional sparse support estimator network (csen): From energy-efficient support estimation to learning-aided compressive sensing.IEEE Transactions on Neural Networks and Learning Systems, 34(1):290– 304, 2021

work page 2021

[73] [74]

Mehmet Yamaç, Mert Duman, ˙Ilke Adalıo˘glu, Serkan Kiranyaz, and Moncef Gabbouj. A personalized zero-shot ecg arrhythmia monitoring system: From sparse representation based domain adaption to energy efficient abnormal beat detection for practical ecg surveillance.arXiv preprint arXiv:2207.07089, 2022

work page arXiv 2022

[74] [75]

Video-rate 3d imaging of living cells using fourier view-channel-depth light field microscopy.Communications biology, 6(1):1259, 2023

Chengqiang Yi, Lanxin Zhu, Jiahao Sun, Zhaofei Wang, Meng Zhang, Fenghe Zhong, Luxin Yan, Jiang Tang, Liang Huang, Yu-Hui Zhang, et al. Video-rate 3d imaging of living cells using fourier view-channel-depth light field microscopy.Communications biology, 6(1):1259, 2023

work page 2023

[75] [76]

Cutmix: Regularization strategy to train strong classifiers with localizable features

Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6023–6032, 2019

work page 2019

[76] [77]

Understanding deep learning requires rethinking generalization

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization.arXiv preprint arXiv:1611.03530, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[77] [78]

mixup: Beyond Empirical Risk Minimization

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization.arXiv preprint arXiv:1710.09412, 2017. 37 A Notation In this work, we consider the ℓp–norm of a vector x∈R n, defined by ∥x∥p = (Pn i=1 |xi|p)1/p with p≥1 . The ℓ0 “norm” is given by ∥x∥0 = limp→0 Pn i=1 |xi|p, which counts the number of nonzero e...

work page internal anchor Pith review Pith/arXiv arXiv 2017