Axiomatizing Neural Networks via Pursuit of Subspaces
Pith reviewed 2026-05-21 06:48 UTC · model grok-4.3
The pith
Neural networks operate according to geometric postulates about pursuing subspaces.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The PoS axioms together with their derived consequences provide a unified perspective on representation, computation, and generalization in both shallow and deep architectures and yield geometric explanations for fundamental questions in deep learning.
What carries the argument
The Pursuit of Subspaces (PoS) hypothesis, a collection of geometric postulates that treat network dynamics as the systematic pursuit of subspaces.
If this is right
- The framework supplies geometric accounts of how representations form in network layers.
- It explains the effects of architectural choices such as depth and width through subspace mechanisms.
- Generalization behavior follows as a direct consequence of the geometric postulates rather than separate statistical arguments.
- Both shallow linear networks and deep nonlinear ones fall under the same set of axioms and derived results.
Where Pith is reading between the lines
- If the axioms prove faithful, designers could use them to derive new architectures instead of relying on empirical search.
- The approach may connect to existing geometric ideas in machine learning such as manifold assumptions without requiring additional machinery.
- A direct test would involve checking whether subspace-pursuit predictions match the internal activations of trained networks on simple datasets.
- The same postulates might extend to explain certain behaviors in other parameterized models beyond standard neural networks.
Load-bearing premise
Neural network behavior can be captured by a small set of geometric postulates whose consequences are both non-trivial and faithful to observed network dynamics.
What would settle it
A clear case of network training or generalization where the observed representations and performance cannot be derived from or predicted by the PoS axioms.
Figures
read the original abstract
While deep neural networks have achieved remarkable success across a wide range of domains, their underlying mechanisms remain poorly understood, and they are often regarded as black boxes. This gap between empirical performance and theoretical understanding poses a challenge analogous to the pre-axiomatic stage of classical geometry. In this work, we introduce the Pursuit of Subspaces (PoS) hypothesis, an axiomatic framework that formulates neural network behavior through a set of geometric postulates. These axioms, together with their derived consequences, provide a unified perspective on representation, computation, and generalization in both shallow and deep architectures. We show that this framework yields geometric explanations for fundamental questions in deep learning, including representation structure, architectural mechanisms, and generalization behavior, offering a principled step toward a coherent theoretical foundation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes the Pursuit of Subspaces (PoS) hypothesis, an axiomatic framework that formulates neural network behavior via a set of geometric postulates. It claims that these axioms and their derived consequences furnish a unified perspective on representation, computation, and generalization for both shallow and deep architectures, while supplying geometric explanations for core questions including representation structure, architectural mechanisms, and generalization behavior.
Significance. If the postulates can be rigorously linked to SGD dynamics and shown to generate non-trivial, falsifiable consequences that match observed network behavior, the framework could supply a coherent theoretical foundation that moves beyond black-box descriptions. The explicit attempt at an axiomatic treatment is a constructive direction for the field.
major comments (2)
- [Abstract] Abstract: the assertion that the PoS axioms 'yield geometric explanations' for representation structure, architectural mechanisms, and generalization behavior is unsupported by any derivation steps, formal proofs, or empirical checks within the provided text; the central claim therefore rests on an unshown transition from postulates to concrete predictions.
- [§§2–3] §§2–3: the geometric postulates are introduced as primitive without a derivation from gradient-based optimization (chain rule) or the geometry of the empirical risk surface, so it remains unclear whether they explain hierarchical feature learning and generalization or merely re-describe them.
minor comments (1)
- [Introduction] The analogy drawn to the pre-axiomatic stage of classical geometry could be sharpened by identifying which specific geometric results (e.g., parallel postulate consequences) are meant to parallel the intended NN theorems.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed report. The comments identify important opportunities to strengthen the clarity and rigor of the axiomatic presentation. We respond to each major comment below and indicate the revisions we will incorporate.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion that the PoS axioms 'yield geometric explanations' for representation structure, architectural mechanisms, and generalization behavior is unsupported by any derivation steps, formal proofs, or empirical checks within the provided text; the central claim therefore rests on an unshown transition from postulates to concrete predictions.
Authors: We agree that the abstract statement is stated at a high level of generality. Sections 4 and 5 of the manuscript derive several concrete geometric consequences from the PoS axioms, including the emergence of hierarchical subspace pursuit, a geometric account of skip connections, and a subspace-based generalization bound. To make the transition from postulates to predictions explicit already in the abstract, we will revise the abstract to reference these specific derived results and add forward pointers to the relevant theorems and corollaries. revision: yes
-
Referee: [§§2–3] §§2–3: the geometric postulates are introduced as primitive without a derivation from gradient-based optimization (chain rule) or the geometry of the empirical risk surface, so it remains unclear whether they explain hierarchical feature learning and generalization or merely re-describe them.
Authors: The PoS hypothesis is formulated as an axiomatic system in which the geometric postulates are taken as primitives, analogous to the role of incidence and congruence axioms in classical geometry. The manuscript motivates these postulates from observed neural-network phenomenology and then derives non-trivial consequences (e.g., progressive subspace alignment and implicit regularization effects) that go beyond re-description. Nevertheless, we acknowledge the value of an explicit link to SGD dynamics. We will add a new subsection in Section 3 that sketches how the postulates can arise as effective descriptions of gradient flow on the empirical risk surface, drawing on existing results on the geometry of overparameterized loss landscapes, while preserving the axiomatic character of the framework. revision: partial
Circularity Check
Axiomatic postulates presented as primitives; derivations self-contained
full rationale
The manuscript introduces the PoS hypothesis explicitly as a set of geometric postulates chosen to formulate observed neural network behavior, then derives consequences for representation, computation, and generalization. No quoted equations or sections reduce any claimed prediction or explanation back to the postulates by construction (e.g., no fitted parameters renamed as predictions, no self-citation chain supplying the load-bearing uniqueness, and no ansatz smuggled via prior work). The framework is therefore self-contained as an axiomatic starting point rather than a tautological re-description of its inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- ad hoc to paper Neural network behavior can be formulated through a set of geometric postulates concerning pursuit of subspaces.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean, IndisputableMonolith/Cost/FunctionalEquation.leanreality_from_one_distinction, washburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce the Pursuit of Subspaces (PoS) hypothesis, an axiomatic framework that formulates neural network behavior through a set of geometric postulates... Postulate 1 (Compactness of Data Representation)... Postulate 2 (Nonlinear Orthogonal Projection onto Submanifolds)... Postulate 3 (Orthogonal Complements via Residual Connection)... Postulate 4 (Recursive Application of Nonlinear Projections)
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 1 (Fundamental Theorem of Deep Learning)... NDNN ∼ Cϵ(M) + Σ Cϵ(Gℓ)Cϵ(MDi) (additive scaling)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Mete Ahishali, Mehmet Yamac, Serkan Kiranyaz, and Moncef Gabbouj. Operational sup- port estimator networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):8442–8458, 2024
work page 2024
-
[2]
A spline theory of deep learning
Randall Balestriero et al. A spline theory of deep learning. InInternational Conference on Machine Learning, pages 374–383. PMLR, 2018
work page 2018
-
[3]
Randall Balestriero and Yann LeCun. Learning by reconstruction produces uninformative features for perception.arXiv preprint arXiv:2402.11337, 2024
-
[4]
Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine- learning practice and the classical bias–variance trade-off.Proceedings of the National Academy of Sciences, 116(32):15849–15854, 2019
work page 2019
-
[5]
Thomas Blumensath and Mike E Davies. Sampling theorems for signals from the union of finite- dimensional linear subspaces.IEEE Transactions on Information Theory, 55(4):1872–1882, 2009
work page 2009
-
[6]
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
Michael M. Bronstein, Joan Bruna, Taco Cohen, and Petar Veli ˇckovi´c. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges.arXiv preprint arXiv:2104.13478, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[7]
Emmanuel J Candes. The restricted isometry property and its implications for compressed sensing.Comptes rendus mathematique, 346(9-10):589–592, 2008
work page 2008
-
[8]
Emmanuel J Candès et al. Compressive sampling. InProceedings of the International Congress of Mathematicians, volume 3, pages 1433–1452, 2006
work page 2006
-
[9]
Decoding by linear programming.IEEE transactions on information theory, 51(12):4203–4215, 2005
Emmanuel J Candes and Terence Tao. Decoding by linear programming.IEEE transactions on information theory, 51(12):4203–4215, 2005
work page 2005
-
[10]
Ecg monitoring in wearable devices by sparse models
Diego Carrera, Beatrice Rossi, Daniele Zambon, Pasqualina Fragneto, and Giacomo Boracchi. Ecg monitoring in wearable devices by sparse models. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 145–160. Springer, 2016
work page 2016
-
[11]
Xi Chen, Diederik P Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, and Pieter Abbeel. Variational lossy autoencoder.arXiv preprint arXiv:1611.02731, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[12]
SueYeon Chung and Larry F Abbott. Neural population geometry: An approach for understand- ing biological and artificial neural networks.Current opinion in neurobiology, 70:137–144, 2021
work page 2021
-
[13]
Certified adversarial robustness via randomized smoothing
Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. Ininternational conference on machine learning, pages 1310–1320. PMLR, 2019
work page 2019
-
[14]
Gauge equivariant convolutional networks and the icosahedral cnn
Taco Cohen, Maurice Weiler, Berkay Kicanaoglu, and Max Welling. Gauge equivariant convolutional networks and the icosahedral cnn. InInternational conference on Machine learning, pages 1321–1330. PMLR, 2019
work page 2019
-
[15]
Group equivariant convolutional networks
Taco Cohen and Max Welling. Group equivariant convolutional networks. InInternational conference on machine learning, pages 2990–2999. PMLR, 2016
work page 2016
-
[16]
Uri Cohen, SueYeon Chung, Daniel D Lee, and Haim Sompolinsky. Separability and geometry of object manifolds in deep neural networks.Nature communications, 11(1):746, 2020. 34
work page 2020
-
[17]
Randaugment: Practical automated data augmentation with a reduced search space
Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical automated data augmentation with a reduced search space. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 702–703, 2020
work page 2020
-
[18]
Compressed sensing.IEEE Transactions on information theory, 52(4):1289–1306, 2006
David L Donoho et al. Compressed sensing.IEEE Transactions on information theory, 52(4):1289–1306, 2006
work page 2006
-
[19]
Nonlinear orthogonal projection
Ewa Dudek and Konstanty Holly. Nonlinear orthogonal projection. InAnnales Polonici Mathematici, volume 59, pages 1–31. Polska Akademia Nauk. Instytut Matematyczny PAN, 1994
work page 1994
-
[20]
Cambridge University Press, 2018
Bjørn Ian Dundas.A short course in differential topology. Cambridge University Press, 2018
work page 2018
-
[21]
Association for the Advancement of Medical Instrumentation. Recommended practice for testing and reporting performance results of ventricular arrhythmia detection algorithms.Arlington, VA, 1987
work page 1987
-
[22]
Karl Friston. A theory of cortical responses.Philosophical transactions of the Royal Society B: Biological sciences, 360(1456):815–836, 2005
work page 2005
-
[23]
The free-energy principle: a unified brain theory?Nature reviews neuroscience, 11(2):127–138, 2010
Karl Friston. The free-energy principle: a unified brain theory?Nature reviews neuroscience, 11(2):127–138, 2010
work page 2010
-
[24]
M. Gabbouj, S. Kiranyaz, J. Malik, M. U. Zahid, T. Ince, M. E. H. Chowdhury, A. Khandakar, and A. Tahir. Robust Peak Detection for Holter ECGs by Self-Organized Operational Neural Networks.IEEE Trans Neural Netw Learn Syst, PP, Mar 2022
work page 2022
-
[25]
Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals.Circulation, 101(23):e215–e220, 2000
work page 2000
-
[26]
Towards trustworthy deep learning for image reconstruction
Alexis Marie Frederic Goujon. Towards trustworthy deep learning for image reconstruction. Technical report, EPFL, 2024
work page 2024
-
[27]
American Mathematical Society, 2025
Victor Guillemin and Alan Pollack.Differential topology, volume 370. American Mathematical Society, 2025
work page 2025
-
[28]
Fourier light-field microscopy.Optics express, 27(18):25573–25594, 2019
Changliang Guo, Wenhao Liu, Xuanwen Hua, Haoyu Li, and Shu Jia. Fourier light-field microscopy.Optics express, 27(18):25573–25594, 2019
work page 2019
-
[29]
Michael Hauser and Asok Ray. Principles of riemannian geometry in neural networks.Advances in neural information processing systems, 30, 2017
work page 2017
-
[31]
Masked autoencoders are scalable vision learners
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022
work page 2022
-
[32]
Analysis of a complex of statistical variables into principal components
Harold Hotelling. Analysis of a complex of statistical variables into principal components. Journal of educational psychology, 24(6):417, 1933
work page 1933
-
[33]
Splinecam: Exact visualization and characterization of deep network geometry and decision boundaries
Ahmed Imtiaz Humayun, Randall Balestriero, Guha Balakrishnan, and Richard G Baraniuk. Splinecam: Exact visualization and characterization of deep network geometry and decision boundaries. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3789–3798, 2023
work page 2023
-
[34]
Geometric manifold learning.IEEE Signal Processing Magazine, 28(2):69–76, 2011
Arta A Jamshidi, Michael J Kirby, and Dave S Broomhead. Geometric manifold learning.IEEE Signal Processing Magazine, 28(2):69–76, 2011
work page 2011
-
[35]
Extensions of Lipschitz Mappings Into a Hilbert Space.Contemporary mathematics, 26(189-206):1, 1984
William B Johnson and Joram Lindenstrauss. Extensions of Lipschitz Mappings Into a Hilbert Space.Contemporary mathematics, 26(189-206):1, 1984
work page 1984
-
[36]
Transformers are rnns: Fast autoregressive transformers with linear attention
Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. Transformers are rnns: Fast autoregressive transformers with linear attention. InInternational conference on machine learning, pages 5156–5165. PMLR, 2020
work page 2020
-
[37]
Serkan Kiranyaz, Turker Ince, and Moncef Gabbouj. Real-time patient-specific ecg classifi- cation by 1-d convolutional neural networks.IEEE Transactions on Biomedical Engineering, 63(3):664–675, 2016. 35
work page 2016
-
[38]
Serkan Kiranyaz, Turker Ince, and Moncef Gabbouj. Personalized monitoring and advance warning system for cardiac arrhythmias.Scientific Reports, 7(1):9270, 2017
work page 2017
-
[39]
Risi Kondor and Shubhendu Trivedi. On the generalization of equivariance and convolution in neural networks to the action of compact groups. InInternational conference on machine learning, pages 2747–2755. PMLR, 2018
work page 2018
-
[40]
Masked autoencoders for microscopy are scalable learners of cellular biology
Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, et al. Masked autoencoders for microscopy are scalable learners of cellular biology. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11757–11768, 2024
work page 2024
-
[41]
Neural tuning and representational geometry.Nature Reviews Neuroscience, 22(11):703–718, 2021
Nikolaus Kriegeskorte and Xue-Xin Wei. Neural tuning and representational geometry.Nature Reviews Neuroscience, 22(11):703–718, 2021
work page 2021
-
[42]
Learning multiple layers of features from tiny images
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009
work page 2009
- [43]
-
[44]
Mario Lezcano-Casado and David Martınez-Rubio. Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group. InInternational Conference on Machine Learning, pages 3794–3803. PMLR, 2019
work page 2019
-
[45]
Cuiwei Li, Chongxun Zheng, and Changfeng Tai. Detection of ecg characteristic points using wavelet transforms.IEEE Transactions on biomedical Engineering, 42(1):21–28, 1995
work page 1995
-
[46]
Shuai Li, Kui Jia, Yuxin Wen, Tongliang Liu, and Dacheng Tao. Orthogonal deep neural networks.IEEE transactions on pattern analysis and machine intelligence, 43(4):1352–1368, 2019
work page 2019
-
[47]
Towards robust neural networks via random self-ensemble
Xuanqing Liu, Minhao Cheng, Huan Zhang, and Cho-Jui Hsieh. Towards robust neural networks via random self-ensemble. InProceedings of the european conference on computer vision (ECCV), pages 369–385, 2018
work page 2018
-
[48]
SGDR: Stochastic Gradient Descent with Warm Restarts
Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[49]
Yue M Lu and Minh N Do. A theory for sampling signals from a union of subspaces.IEEE transactions on signal processing, 56(6):2334–2345, 2008
work page 2008
-
[50]
Mario Merone, Paolo Soda, Mario Sansone, and Carlo Sansone. Ecg databases for biometric systems: A systematic review.Expert Systems with Applications, 67:189–202, 2017
work page 2017
-
[51]
Guido Montúfar, Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio. On the number of linear regions of deep neural networks.Advances in neural information processing systems, 27, 2014
work page 2014
-
[52]
George B Moody and Roger G Mark. The impact of the mit-bih arrhythmia database.IEEE Engineering in Medicine and Biology Magazine, 20(3):45–50, 2001
work page 2001
- [53]
-
[54]
Sample complexity of testing the manifold hypothesis
Hariharan Narayanan and Sanjoy Mitter. Sample complexity of testing the manifold hypothesis. Advances in neural information processing systems, 23, 2010
work page 2010
-
[55]
Adding Gradient Noise Improves Learning for Very Deep Networks
Arvind Neelakantan, Luke Vilnis, Quoc V Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, and James Martens. Adding gradient noise improves learning for very deep networks.arXiv preprint arXiv:1511.06807, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[56]
Partha Niyogi, Stephen Smale, and Shmuel Weinberger. Finding the homology of submanifolds with high confidence from random samples.Discrete & Computational Geometry, 39(1):419– 441, 2008
work page 2008
-
[57]
Mortal computation: A foundation for biomimetic intelligence.arXiv preprint arXiv:2311.09589, 2023
Alexander Ororbia and Karl Friston. Mortal computation: A foundation for biomimetic intelligence.arXiv preprint arXiv:2311.09589, 2023
-
[58]
A real-time qrs detection algorithm.IEEE transactions on biomedical engineering, (3):230–236, 1985
Jiapu Pan and Willis J Tompkins. A real-time qrs detection algorithm.IEEE transactions on biomedical engineering, (3):230–236, 1985
work page 1985
-
[59]
On the number of response regions of deep feed forward networks with piece-wise linear activations
Razvan Pascanu, Guido Montufar, and Yoshua Bengio. On the number of response regions of deep feed forward networks with piece-wise linear activations.arXiv preprint arXiv:1312.6098, 2013. 36
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[60]
A neural manifold view of the brain
Matthew G Perich, Devika Narain, and Juan A Gallego. A neural manifold view of the brain. Nature Neuroscience, 28(8):1582–1597, 2025
work page 2025
-
[61]
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant Misra. Grokking: Gen- eralization beyond overfitting on small algorithmic datasets.arXiv preprint arXiv:2201.02177, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[62]
Prince.Understanding Deep Learning
Simon J.D. Prince.Understanding Deep Learning. The MIT Press, 2023
work page 2023
-
[63]
On the expressive power of deep neural networks
Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, and Jascha Sohl-Dickstein. On the expressive power of deep neural networks. Ininternational conference on machine learning, pages 2847–2854. PMLR, 2017
work page 2017
-
[64]
Marc’Aurelio Ranzato, Christopher Poultney, Sumit Chopra, and Yann Cun. Efficient learning of sparse representations with an energy-based model.Advances in neural information processing systems, 19, 2006
work page 2006
-
[65]
The manifold tangent classifier.Advances in neural information processing systems, 24, 2011
Salah Rifai, Yann N Dauphin, Pascal Vincent, Yoshua Bengio, and Xavier Muller. The manifold tangent classifier.Advances in neural information processing systems, 24, 2011
work page 2011
-
[66]
The unreasonable effectiveness of deep learning in artificial intelligence
Terrence J Sejnowski. The unreasonable effectiveness of deep learning in artificial intelligence. Proceedings of the National Academy of Sciences, 117(48):30033–30038, 2020
work page 2020
-
[67]
Bounding and counting linear regions of deep neural networks
Thiago Serra, Christian Tjandraatmadja, and Srikumar Ramalingam. Bounding and counting linear regions of deep neural networks. InInternational conference on machine learning, pages 4558–4566. PMLR, 2018
work page 2018
-
[68]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting.The journal of machine learning research, 15(1):1929–1958, 2014
work page 1929
-
[69]
Tu.An Introduction to Manifolds
L.W. Tu.An Introduction to Manifolds. Universitext. Springer New York, 2010
work page 2010
-
[70]
Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol, and Léon Bottou. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.Journal of machine learning research, 11(12), 2010
work page 2010
-
[71]
Cvt: Introducing convolutions to vision transformers
Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, and Lei Zhang. Cvt: Introducing convolutions to vision transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 22–31, 2021
work page 2021
-
[72]
Masked frequency modeling for self-supervised visual pre-training
Jiahao Xie, Wei Li, Xiaohang Zhan, Ziwei Liu, Yew-Soon Ong, and Chen Change Loy. Masked frequency modeling for self-supervised visual pre-training. InThe Eleventh International Conference on Learning Representations
-
[73]
Mehmet Yamaç, Mete Ahishali, Serkan Kiranyaz, and Moncef Gabbouj. Convolutional sparse support estimator network (csen): From energy-efficient support estimation to learning-aided compressive sensing.IEEE Transactions on Neural Networks and Learning Systems, 34(1):290– 304, 2021
work page 2021
-
[74]
Mehmet Yamaç, Mert Duman, ˙Ilke Adalıo˘glu, Serkan Kiranyaz, and Moncef Gabbouj. A personalized zero-shot ecg arrhythmia monitoring system: From sparse representation based domain adaption to energy efficient abnormal beat detection for practical ecg surveillance.arXiv preprint arXiv:2207.07089, 2022
-
[75]
Chengqiang Yi, Lanxin Zhu, Jiahao Sun, Zhaofei Wang, Meng Zhang, Fenghe Zhong, Luxin Yan, Jiang Tang, Liang Huang, Yu-Hui Zhang, et al. Video-rate 3d imaging of living cells using fourier view-channel-depth light field microscopy.Communications biology, 6(1):1259, 2023
work page 2023
-
[76]
Cutmix: Regularization strategy to train strong classifiers with localizable features
Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6023–6032, 2019
work page 2019
-
[77]
Understanding deep learning requires rethinking generalization
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization.arXiv preprint arXiv:1611.03530, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[78]
mixup: Beyond Empirical Risk Minimization
Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization.arXiv preprint arXiv:1710.09412, 2017. 37 A Notation In this work, we consider the ℓp–norm of a vector x∈R n, defined by ∥x∥p = (Pn i=1 |xi|p)1/p with p≥1 . The ℓ0 “norm” is given by ∥x∥0 = limp→0 Pn i=1 |xi|p, which counts the number of nonzero e...
work page internal anchor Pith review Pith/arXiv arXiv 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.