pith. sign in

arxiv: 2601.06597 · v2 · submitted 2026-01-10 · 💻 cs.LG · stat.ML

Understanding and inverse design of implicit bias in stochastic learning: a geometric perspective

Pith reviewed 2026-05-16 15:03 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords implicit biasstochastic gradient descentgeometric correctioncontinuous symmetriesinverse designloss landscapeoverparameterized modelssparsity
0
0 comments X

The pith

Implicit bias in stochastic learning arises as a geometric correction from gradient noise interacting with continuous loss symmetries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework showing that implicit bias emerges when stochastic gradient noise interacts with the continuous symmetries of the loss function, producing a predictable geometric shift among solutions that share the same loss value. This mechanism unifies previous observations across models and turns bias from an unexplained side effect into a controllable geometric feature. If the account holds, it becomes possible to engineer parameterizations that preserve the predictor while deliberately steering the bias, for instance toward sparse or spectrally sparse solutions. A reader would care because learned representations determine how models generalize, interpret data, and remain robust, and this view supplies a direct handle on those representations through the training dynamics themselves.

Core claim

Implicit bias is induced as a geometric correction by the interplay between gradient noise and continuous symmetries of the loss. The authors compute this correction for a range of architectures, use it to predict new behaviors and recover known ones, and demonstrate inverse design by constructing predictor-preserving parameterizations that shape the bias, with sparsity and spectral sparsity arising as canonical outcomes. Numerical experiments confirm the predicted corrections and the effectiveness of the inverse-design procedure in controlled settings.

What carries the argument

The geometric correction induced by the interplay between gradient noise and continuous symmetries of the loss; it selects among equivalent-loss solutions by shifting the effective optimization trajectory.

If this is right

  • The induced bias can be calculated explicitly for multiple standard architectures.
  • Previously observed implicit-bias phenomena receive a single geometric explanation.
  • New bias behaviors can be predicted before training begins.
  • Predictor-preserving reparameterizations can be designed to steer the bias toward sparsity or spectral sparsity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same noise-symmetry mechanism may extend to discrete symmetries or to non-gradient optimizers if the effective noise structure can be characterized.
  • Engineering symmetries into the loss could become a systematic route to built-in regularization without changing the data or the predictor.
  • The framework suggests checking whether the magnitude of the correction scales with batch size or learning-rate schedule in the way the geometric term predicts.

Load-bearing premise

Stochastic gradient noise interacts with continuous symmetries of the loss to produce a predictable and computable geometric correction.

What would settle it

A controlled experiment on a loss with known continuous symmetries where the measured implicit bias deviates systematically from the geometric correction computed by the framework under the observed noise statistics.

Figures

Figures reproduced from arXiv: 2601.06597 by Alberto d'Onofrio, Alessio Ansuini, Emanuele Ballarin, Fabio Anselmi, Matteo Biagetti, Nicola Aladrah.

Figure 1
Figure 1. Figure 1: | Hyperbolic level sets of equivalent parametrizations and symmetry-breaking. a, Hyperbolic level sets u · v = θ in the positive (u, v)-plane. Each branch represents all parameter pairs (u, v) that produce the same predictor θ, making the symmetry of the factorized parameteriza￾tion explicit. b, The diagonal line u = v defines symmetry-breaking that intersects each orbit once in the positive plane, selecti… view at source ↗
Figure 2
Figure 2. Figure 2: | Implicit norm equilibration in shallow ReLU networks. A student model yˆ = v ⊤ ReLU(W x) with learnable parameters v and W is trained via SGD on the mean square error loss to replicate the behavior of a teacher oracle y ⋆ = v ⋆⊤ ReLU(W⋆x) on a regression task. The entries of v ⋆ and W⋆ are randomly sampled before training, ensuring that the [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: | Query–key norm equilibration in single-head scaled dot-product atten￾tion. A student model implementing single-head scaled dot-product attention — i.e. Y = softmax XQ(XK)⊤ √rk  XV — with learnable key (K), query (Q) and value (V ) matrices is trained by SGD on the mean square error loss to replicate the behavior of a teacher oracle with the same structure (and matrices respectively K⋆ , Q⋆ , V ⋆ ) on a… view at source ↗
Figure 4
Figure 4. Figure 4: | Implicit low-rank recovery in matrix completion. A rank-2 ground-truth matrix T ⋆ ∈ R 20×20 with well-separated singular values is to be recovered from just the 20% of its entries via a factorized model Tˆ(U, V ) = UV ⊤ with U ∈ R 20×20 , V ∈ R 20×20. Training performed using SGD on the mean square error loss over the observed entries. Panel a, tracks the estimated singular values σi(UV ⊤) along training… view at source ↗
Figure 5
Figure 5. Figure 5: | Sparse spectral recovery via Hadamard-factored parameterization. Two models are compared in the reconstruction of a spectrally sparse signal from a limited number of noise-corrupted observations, under the drive of SGD on the mean square error loss. A signal y ⋆ = PD−1 k=0 w ⋆ k cos(2πkt) is considered, with amplitudes w ⋆ = [w ⋆ k ] being sparse in the frequency domain (3 nonzero entries with k ≥ 1, amo… view at source ↗
Figure 6
Figure 6. Figure 6: | Recovery of a piecewise-constant signal from noisy compressed measurements. Two models are compared in the reconstruction of a piecewise-constant signal of length N = 200 from m = 60 noisy compressed measurements y = Ax⋆ + ε, with A ∈ R m×N a random Gaussian measurement matrix and ε additive Gaussian noise, under the drive of SGD on the mean square error loss ∥Axˆ − y∥ 2 2 . The baseline model directly l… view at source ↗
read the original abstract

A key challenge in machine learning is to explain how learning dynamics select among the many solutions that achieve identical loss values in overparameterized models - a phenomenon known as implicit bias. Controlling this bias provides a direct mechanism on learned representations, which are central to interpretability, robustness, and reasoning in modern AI systems. Yet, despite its importance, existing explanations remain largely ad hoc and lack a unifying mechanism. We develop a theoretical and constructive framework in which implicit bias emerges as a geometric correction induced by the interplay between gradient noise and continuous symmetries of the loss. We compute the induced bias across a range of architectures, predicting new behaviors and explaining known ones. The approach also enables inverse design: by engineering predictor - preserving parameterizations, it is possible to shape the bias, with sparsity and spectral sparsity emerging as canonical instances. Numerical experiments support the theory and validate the inverse - design framework in controlled settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a theoretical framework in which implicit bias of stochastic gradient descent emerges as a geometric correction induced by the interplay between gradient noise and continuous symmetries of the loss. The authors derive this correction via Lie-algebra averaging over symmetry orbits, compute explicit biases for concrete architectures, predict new behaviors, explain known ones, and demonstrate inverse design by engineering predictor-preserving parameterizations that induce sparsity or spectral sparsity. Numerical experiments in controlled settings are presented to support the theory.

Significance. If the derivation holds, the work supplies a unifying geometric mechanism for implicit bias that moves beyond ad-hoc explanations and directly enables constructive control of learned representations. The inverse-design component is a notable strength, as are the explicit computations across architectures and the attempt to link noise-induced drift to symmetry orbits. These elements could influence both theoretical understanding and practical parameterization choices in overparameterized models.

major comments (2)
  1. [Derivation of the geometric correction (SDE modeling and averaging step)] The central derivation treats the diffusion coefficient perturbatively within an Itô/Fokker-Planck regime to obtain the leading geometric correction (via projection onto the tangent space of the level set). No error bounds or remainder estimates are supplied for the neglected O(η^{3/2}) and higher Itô–Stratonovich terms that appear at finite step-size η. Because the numerical experiments employ practical finite learning rates, the absence of these controls leaves open whether the claimed predictive power survives outside the infinitesimal-noise limit.
  2. [Numerical experiments and architecture-specific computations] The modeling choice that gradient noise interacts with continuous symmetries to produce a computable, architecture-specific bias is load-bearing for all subsequent claims. The paper validates this only within the same perturbative framework used to derive it; no independent test (e.g., comparison against exact discrete SGD trajectories at moderate η or against non-Gaussian noise) is provided to rule out circularity.
minor comments (2)
  1. [Abstract] The abstract introduces 'predictor-preserving parameterizations' without a forward reference; a one-sentence definition or pointer to the relevant section would improve readability.
  2. [Notation and preliminaries] Notation for the Lie-algebra generators, the projection operator, and the diffusion tensor should be collected in a single table or preliminary section to reduce cross-referencing.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. The comments highlight important aspects of the perturbative derivation and validation strategy. We address each point below and describe the revisions we will make to strengthen the presentation.

read point-by-point responses
  1. Referee: The central derivation treats the diffusion coefficient perturbatively within an Itô/Fokker-Planck regime to obtain the leading geometric correction (via projection onto the tangent space of the level set). No error bounds or remainder estimates are supplied for the neglected O(η^{3/2}) and higher Itô–Stratonovich terms that appear at finite step-size η. Because the numerical experiments employ practical finite learning rates, the absence of these controls leaves open whether the claimed predictive power survives outside the infinitesimal-noise limit.

    Authors: We agree that the derivation is perturbative and that rigorous remainder estimates for the Itô–Stratonovich corrections at finite η are not provided. Obtaining such bounds while preserving the Lie-algebra averaging over symmetry orbits is technically demanding and lies outside the scope of the present work. In the revision we will add a new subsection discussing the regime of validity of the leading-order approximation, including heuristic scaling arguments and additional numerical comparisons of the predicted bias against discrete SGD trajectories at moderate learning rates (η ≈ 10^{-3}–10^{-2}). These checks will clarify the practical range in which the geometric correction remains predictive. revision: partial

  2. Referee: The modeling choice that gradient noise interacts with continuous symmetries to produce a computable, architecture-specific bias is load-bearing for all subsequent claims. The paper validates this only within the same perturbative framework used to derive it; no independent test (e.g., comparison against exact discrete SGD trajectories at moderate η or against non-Gaussian noise) is provided to rule out circularity.

    Authors: We acknowledge the concern about potential circularity. The current experiments were designed to isolate the symmetry-induced drift under the modeling assumptions, but they do not constitute fully independent verification. We will revise the numerical section to include (i) direct comparisons of the analytic bias formula against full discrete SGD trajectories at finite step sizes and (ii) simulations with non-Gaussian noise (e.g., heavy-tailed and clipped gradients). These additions will provide an independent test of the architecture-specific predictions and the robustness of the geometric mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation proceeds from SDE geometry and symmetry averaging without reduction to fitted inputs or self-citation chains.

full rationale

The paper constructs the implicit bias explicitly as a drift correction term arising from averaging stochastic gradient noise over the orbit of continuous symmetries of the loss, using the Lie algebra action and projection onto the tangent space of level sets. This step is derived from the Fokker-Planck or Ito expansion of the SGD SDE and produces computable predictions for specific architectures that are then checked numerically; no parameter is fitted to the target bias and then relabeled as a prediction, and no load-bearing premise rests on a self-citation whose content is itself unverified. The framework therefore remains self-contained against external benchmarks and does not collapse by construction to its modeling assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard domain assumptions about loss symmetries and stochastic noise without introducing new free parameters or invented entities in the abstract description.

axioms (2)
  • domain assumption Loss functions possess continuous symmetries
    Invoked as the source of the geometric correction when combined with gradient noise.
  • domain assumption Stochastic gradient descent produces noise that interacts geometrically with loss symmetries
    Central to computing the induced bias and enabling inverse design.

pith-pipeline@v0.9.0 · 5472 in / 1371 out tokens · 67444 ms · 2026-05-16T15:03:14.095092+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

  1. [1]

    Reconciling modern machine- learning practice and the classical bias–variance trade-off.Proceedings of the National Academy of Sciences, 116(32):15849–15854, July 2019

    Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine- learning practice and the classical bias–variance trade-off.Proceedings of the National Academy of Sciences, 116(32):15849–15854, July 2019

  2. [2]

    In search of the real inductive bias: On the role of implicit regularization in deep learning

    Behnam Neyshabur, Ryota Tomioka, and Nathan Srebro. In search of the real inductive bias: On the role of implicit regularization in deep learning. InProceedings of the International Conference on Learning Representations, Workshop Track, 2015

  3. [3]

    The implicit bias of gradient descent on separable data.Journal of Machine Learning Research, 19(70):1–57, 2018

    Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, and Nathan Srebro. The implicit bias of gradient descent on separable data.Journal of Machine Learning Research, 19(70):1–57, 2018. 23

  4. [4]

    On the implicit bias in deep-learning algorithms.Communications of the ACM, 66(6):86–93, 2023

    Gal Vardi. On the implicit bias in deep-learning algorithms.Communications of the ACM, 66(6):86–93, 2023

  5. [5]

    The implicit bias of gradient descent on nonseparable data

    Ziwei Ji and Matus Telgarsky. The implicit bias of gradient descent on nonseparable data. In Proceedings of the Conference on Learning Theory, pages 1772–1798, 2019

  6. [6]

    Gradient descent maximizes the margin of homogeneous neural networks

    Kaifeng Lyu and Jian Li. Gradient descent maximizes the margin of homogeneous neural networks. InProceedings of the International Conference on Learning Representations, 2020

  7. [7]

    Schapire, and Matus Telgarsky

    Ziwei Ji, Miroslav Dudík, Robert E. Schapire, and Matus Telgarsky. Risk and parameter convergence of logistic regression.Journal of Machine Learning Research, 21(73):1–61, 2020

  8. [8]

    Implicit bias of gradient descent for logistic regression at the edge of stability

    Jingfeng Wu, Vladimir Braverman, and Jason D Lee. Implicit bias of gradient descent for logistic regression at the edge of stability. InAdvances in Neural Information Processing Systems, pages 74229–74256, 2023

  9. [9]

    The implicit bias of gradient descent on separable multiclass data

    Hrithik Ravi, Clayton Scott, Daniel Soudry, and Yutong Wang. The implicit bias of gradient descent on separable multiclass data. InAdvances in Neural Information Processing Systems, pages 81324–81359, 2024

  10. [10]

    A unifying view on implicit bias in training linear neural networks

    Chulhee Yun, Shankar Krishnan, and Hossein Mobahi. A unifying view on implicit bias in training linear neural networks. InProceedings of the International Conference on Learning Representations, 2021

  11. [11]

    Characterizing implicit bias in terms of optimization geometry

    Suriya Gunasekar, Jason Lee, Daniel Soudry, and Nathan Srebro. Characterizing implicit bias in terms of optimization geometry. InProceedings of the International Conference on Machine Learning, pages 1832–1841, 2018

  12. [12]

    Implicit regularization in deep matrix factorization

    Sanjeev Arora, Nadav Cohen, Wei Hu, and Yuping Luo. Implicit regularization in deep matrix factorization. InAdvances in Neural Information Processing Systems, 2019

  13. [13]

    Implicit regularization of discrete gradient dynamics in linear neural networks

    Gauthier Gidel, Francis Bach, and Simon Lacoste-Julien. Implicit regularization of discrete gradient dynamics in linear neural networks. InAdvances in Neural Information Processing Systems, 2019

  14. [14]

    Gradient descent for deep matrix factorization: Dynamics and implicit bias towards low rank.Applied and Computational Harmonic Analysis, 68:101595, 2024

    Hung-Hsu Chou, Carsten Gieshoff, Johannes Maly, and Holger Rauhut. Gradient descent for deep matrix factorization: Dynamics and implicit bias towards low rank.Applied and Computational Harmonic Analysis, 68:101595, 2024

  15. [15]

    Dynamics in deep classifiers trained with the square loss: Normalization, low rank, neural collapse, and generalization bounds.Research, 6:0024, 2023

    Mengjia Xu, Akshay Rangamani, Qianli Liao, Tomer Galanti, and Tomaso Poggio. Dynamics in deep classifiers trained with the square loss: Normalization, low rank, neural collapse, and generalization bounds.Research, 6:0024, 2023

  16. [16]

    Implicit regularization in deep learning may not be explainable by norms

    Noam Razin and Nadav Cohen. Implicit regularization in deep learning may not be explainable by norms. InAdvances in Neural Information Processing Systems, pages 21174–21187, 2020

  17. [17]

    What happens after SGD reaches zero loss? – a mathematical framework

    Zhiyuan Li, Tianhao Wang, and Sanjeev Arora. What happens after SGD reaches zero loss? – a mathematical framework. InProceedings of the International Conference on Learning Representations, 2022

  18. [18]

    Implicit bias of deep linear networks in the large learning rate phase, 2020

    Wei Huang, Weitao Du, Richard Yi Da Xu, and Chunrui Liu. Implicit bias of deep linear networks in the large learning rate phase, 2020

  19. [19]

    A.R. Barron. Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information Theory, 39(3):930–945, 1993. 24

  20. [20]

    PhD thesis, Toyota Technological Institute at Chicago, 2017

    Behnam Neyshabur.Implicit regularization in deep learning. PhD thesis, Toyota Technological Institute at Chicago, 2017

  21. [21]

    Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss

    Lenaic Chizat and Francis Bach. Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss. InProceedings of the Conference on Learning Theory, pages 1305–1338, 2020

  22. [22]

    Stochastic gradient descent as approximate Bayesian inference.Journal of Machine Learning Research, 18(134):1–35, 2017

    Stephan Mandt, Matthew D Hoffman, and David M Blei. Stochastic gradient descent as approximate Bayesian inference.Journal of Machine Learning Research, 18(134):1–35, 2017

  23. [23]

    Stochastic modified equations and adaptive stochastic gradient algorithms

    Qianxiao Li, Cheng Tai, et al. Stochastic modified equations and adaptive stochastic gradient algorithms. InProceedings of the International Conference on Machine Learning, pages 2101– 2110, 2017

  24. [24]

    Stochastic modified equations and dynamics of stochastic gradient algorithms I: Mathematical foundations.Journal of Machine Learning Research, 20(40):1–47, 2019

    Qianxiao Li, Cheng Tai, et al. Stochastic modified equations and dynamics of stochastic gradient algorithms I: Mathematical foundations.Journal of Machine Learning Research, 20(40):1–47, 2019

  25. [25]

    Theory of deep learning IIb: Optimization properties of SGD, 2018

    Chiyuan Zhang, Qianli Liao, Alexander Rakhlin, Brando Miranda, Noah Golowich, and Tomaso Poggio. Theory of deep learning IIb: Optimization properties of SGD, 2018

  26. [26]

    A Bayesian perspective on generalization and stochastic gradient descent

    Samuel L Smith and Quoc V Le. A Bayesian perspective on generalization and stochastic gradient descent. InProceedings of the International Conference on Learning Representations, 2018

  27. [27]

    A diffusion theory for deep learning dynamics: Stochastic gradient descent exponentially favors flat minima

    Zeke Xie, Issei Sato, and Masashi Sugiyama. A diffusion theory for deep learning dynamics: Stochastic gradient descent exponentially favors flat minima. InProceedings of the International Conference on Learning Representations, 2021

  28. [28]

    Topological invariance and breakdown in learning, 2025

    Yongyi Yang, Tomaso Poggio, Isaac Chuang, and Liu Ziyin. Topological invariance and breakdown in learning, 2025

  29. [29]

    Neural thermodynamics: Entropic forces in deep and universal representation learning

    Liu Ziyin, Yizhou Xu, and Isaac Chuang. Neural thermodynamics: Entropic forces in deep and universal representation learning. InAdvances in Neural Information Processing Systems, 2025

  30. [30]

    Parameter symmetry and noise equilibrium of stochastic gradient descent

    Liu Ziyin, Mingze Wang, Hongchao Li, and Lei Wu. Parameter symmetry and noise equilibrium of stochastic gradient descent. InAdvances in Neural Information Processing Systems, 2024

  31. [31]

    Symmetry induces structure and constraint of learning

    Liu Ziyin. Symmetry induces structure and constraint of learning. InProceedings of the International Conference on Machine Learning, pages 62847–62866, 2024

  32. [32]

    Parameter symmetry potentially unifies deep learning theory, 2025

    Liu Ziyin, Yizhou Xu, Tomaso Poggio, and Isaac Chuang. Parameter symmetry potentially unifies deep learning theory, 2025

  33. [33]

    Cambridge University Press, Cambridge, UK, 2009

    Sumio Watanabe.Algebraic Geometry and Statistical Learning Theory. Cambridge University Press, Cambridge, UK, 2009

  34. [34]

    David G. Kendall. A survey of the statistical theory of shape.Statistical Science, 4(2):87–99, 1989

  35. [35]

    Intrinsic statistics on Riemannian manifolds: Basic tools for geometric mea- surements.Journal of Mathematical Imaging and Vision, 25(1):127–154, 2006

    Xavier Pennec. Intrinsic statistics on Riemannian manifolds: Basic tools for geometric mea- surements.Journal of Mathematical Imaging and Vision, 25(1):127–154, 2006

  36. [36]

    Intrinsic shape analysis: Geodesic principal component analysis for Riemannian manifolds modulo Lie group actions.Statistica Sinica, 20(1):1–100, 2010

    Stephan Huckemann, Thomas Hotz, and Axel Munk. Intrinsic shape analysis: Geodesic principal component analysis for Riemannian manifolds modulo Lie group actions.Statistica Sinica, 20(1):1–100, 2010. 25

  37. [37]

    Classical statistical mechanics of constraints: A theorem and applications to polymers.The Journal of Chemical Physics, 69(4):1527–1537, 1974

    Michael Fixman. Classical statistical mechanics of constraints: A theorem and applications to polymers.The Journal of Chemical Physics, 69(4):1527–1537, 1974

  38. [38]

    Imperial College Press, London, 2010

    Tony Lelièvre, Mathias Rousset, and Gabriel Stoltz.Free Energy Computations. Imperial College Press, London, 2010

  39. [39]

    Numerical-integration of Cartesian equations of motion of a system with constraints – molecular-dynamics of N-alkanes

    Jean-Paul Ryckaert, Giovanni Ciccotti, and Herman Berendsen. Numerical-integration of Cartesian equations of motion of a system with constraints – molecular-dynamics of N-alkanes. Journal of Computational Physics, 23:327–341, March 1977

  40. [40]

    Riemann manifold Langevin and Hamiltonian Monte Carlo methods.Journal of the Royal Statistical Society: Series B, 73(2):123–214, 2011

    Mark Girolami and Ben Calderhead. Riemann manifold Langevin and Hamiltonian Monte Carlo methods.Journal of the Royal Statistical Society: Series B, 73(2):123–214, 2011

  41. [41]

    Chrysos, YongtaoWu, RazvanPascanu, Philip Torr, andVolkan Cevher

    GrigoriosG. Chrysos, YongtaoWu, RazvanPascanu, Philip Torr, andVolkan Cevher. Hadamard product in deep learning: Introduction, advances and challenges.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(8), 2025

  42. [42]

    A survey on deep matrix factoriza- tions.Comput

    Pierre De Handschutter, Nicolas Gillis, and Xavier Siebert. A survey on deep matrix factoriza- tions.Comput. Sci. Rev., 42(C), November 2021

  43. [43]

    Springer, Berlin, 1969

    Herbert Federer.Geometric Measure Theory. Springer, Berlin, 1969. 26