Robust Subspace-Constrained Quadratic Models for Low-Dimensional Structure Learning

Xiaohui Li; Zheng Zhai

arxiv: 2605.20300 · v1 · pith:TYPV7VIHnew · submitted 2026-05-19 · 💻 cs.LG · cs.AI

Robust Subspace-Constrained Quadratic Models for Low-Dimensional Structure Learning

Zheng Zhai , Xiaohui Li This is my paper

Pith reviewed 2026-05-21 07:35 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords robust subspace-constrained quadratic modellow-dimensional structure learninggeneralized Gaussian noiseradial Laplace noisequadratic matrix factorizationhigh-dimensional dataheavy-tailed noiselight-tailed noise

0 comments

The pith

Extending quadratic matrix factorization to generalized Gaussian and radial Laplace noise enables robust low-dimensional structure learning under both heavy-tailed and light-tailed conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a robust subspace-constrained quadratic model that builds on earlier quadratic factorization methods to handle a wider variety of noise. This extension covers generalized Gaussian and radial Laplace distributions so the model stays effective whether noise has heavy or light tails. A gradient-based solver with backtracking line search is introduced to optimize the resulting nonconvex problem. Sensitivity analysis compares the behavior of different loss functions across noise types. Numerical tests show the approach recovers structure more accurately than prior techniques across varied data regimes.

Core claim

The proposed robust subspace-constrained quadratic model accommodates a broad class of noise distributions, including generalized Gaussian and radial Laplace models, thereby substantially enhancing robustness across diverse data regimes while learning low-dimensional subspace structure from high-dimensional data.

What carries the argument

The robust subspace-constrained quadratic model (SCQM), which embeds a subspace constraint into quadratic matrix factorization and replaces the standard noise assumption with a flexible family of distributions to support reliable recovery under varied noise.

If this is right

The gradient-based algorithm with backtracking line search produces stable convergence for the nonconvex problem.
Sensitivity analysis distinguishes the performance of ℓ_p^p loss from ℓ_2 loss under changing noise characteristics.
The model delivers reliable reconstruction when noise is heavy-tailed or light-tailed.
Numerical experiments confirm higher robustness and accuracy than existing methods on test cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar noise generalizations could be applied to other matrix factorization settings that assume low-dimensional structure.
Real datasets with mixed or unknown noise statistics would provide a practical test of whether the claimed robustness transfers beyond controlled experiments.
An adaptive choice of which member of the noise family to use could be added without changing the overall optimization approach.

Load-bearing premise

The underlying data still possesses low-dimensional subspace structure that the quadratic factorization can represent faithfully once the noise model is generalized.

What would settle it

Generate synthetic data with a known low-dimensional subspace, corrupt it with noise whose distribution lies outside the generalized Gaussian and radial Laplace families, and check whether the model recovers the subspace with high error relative to methods tuned for that specific noise.

Figures

Figures reproduced from arXiv: 2605.20300 by Xiaohui Li, Zheng Zhai.

**Figure 2.** Figure 2: Illustration of the fitted curves and projection points obtained using [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 4.** Figure 4: Performance is compared across models and noise levels using [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of reconstruction methods under different loss functions [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Visualization of latent-space (d = 2) interpolation for the linear model (Θ = 0) and the quadratic model (Θ ̸= 0), learned from data consisting of the three digits ‘2’, ‘6’, and ‘8’. the quadratic term significantly improves the discrimination between the digits ‘4’ and ‘9’ with the improvement being particularly evident in the second column from the right. Second, the ℓ1 loss and the ℓ2 loss consistently … view at source ↗

read the original abstract

In this paper, we propose a robust subspace-constrained quadratic model (SCQM) for learning low-dimensional structure from high-dimensional data. Building upon the subspace-constrained quadratic matrix factorization (SQMF) framework, the proposed model accommodates a broad class of noise distributions, including generalized Gaussian and radial Laplace models. This generalization enables reliable performance under both heavy-tailed and light-tailed noise, thereby substantially enhancing robustness across diverse data regimes. To efficiently address the resulting nonconvex optimization problem, we develop a gradient-based algorithm equipped with a backtracking line-search strategy that ensures stable and efficient convergence. In addition, we present a sensitivity analysis of the $\ell_p^p$ and $\ell_2$ loss functions, elucidating their distinct behaviors under varying noise characteristics. Extensive numerical experiments corroborate the theoretical analysis and demonstrate that the proposed approach consistently outperforms existing methods in terms of robustness and reconstruction accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This extends SQMF to generalized Gaussian and radial Laplace noise for robustness but the heavy-tail recovery may need extra conditions the abstract does not supply.

read the letter

The punchline is that this generalizes the existing SQMF framework to generalized Gaussian and radial Laplace noise for better robustness in subspace learning, but the heavy-tail cases may require additional theoretical checks. What is new is the SCQM model that accommodates these noise families while keeping the quadratic factorization. They propose a gradient-based algorithm with backtracking line-search for the nonconvex optimization and provide a sensitivity analysis of the lp^p and l2 losses. The experiments are claimed to show consistent outperformance in robustness and reconstruction accuracy. This approach does well in offering a practical way to handle diverse noise regimes without overhauling the core model. The algorithm ensures stable convergence, and the analysis elucidates the loss behaviors, which is helpful for understanding when to use which. The soft spots are around the theoretical foundation for the new noise models. The stress-test concern is valid in that for p less than 2 with heavy tails, the objective might not have the same strict convexity or unique minimizer, potentially affecting faithful representation of the low-dimensional structure. Without seeing explicit recovery bounds or identifiability conditions adjusted for these regimes, it's unclear if the subspace remains reliably recoverable. The experimental claims also need verification for details like error bars and independent validation to avoid any circularity. This paper is for applied ML researchers focused on low-dimensional structure learning and robust dimensionality reduction. A reader looking for extensions of matrix factorization methods with robustness upgrades would get value from the model, algorithm, and results. It has enough substance and grounding to deserve a serious referee, though it will likely need revisions on the theoretical side for the generalized noise cases. I recommend sending it out for peer review.

Referee Report

1 major / 2 minor

Summary. The paper proposes a robust subspace-constrained quadratic model (SCQM) extending the subspace-constrained quadratic matrix factorization (SQMF) framework to accommodate generalized Gaussian and radial Laplace noise distributions. This enables reliable performance under heavy- and light-tailed noise. The authors develop a gradient-based algorithm with backtracking line-search for the resulting nonconvex problem, provide sensitivity analysis of the ℓ_p^p and ℓ_2 losses, and report numerical experiments showing consistent outperformance over baselines in robustness and reconstruction accuracy.

Significance. If the generalization and recovery claims hold, the work would meaningfully extend quadratic factorization approaches to a wider range of noise regimes, offering practical value for high-dimensional data analysis in machine learning. The gradient-based solver with line search and the loss-function sensitivity analysis are concrete strengths that support usability. The experiments, if properly controlled, add empirical support for the robustness improvements.

major comments (1)

[Abstract and theoretical development (around the SCQM formulation and noise generalization)] The central claim that SCQM enables reliable performance under heavy-tailed noise (generalized Gaussian with p<2 or radial Laplace) rests on the unadjusted SQMF quadratic factorization remaining faithful and identifiable. No new recovery bounds, strict-convexity arguments, or identifiability conditions are supplied for regimes where second moments may fail to exist or the objective loses unique minimizers. This directly affects the robustness guarantee asserted in the abstract and is therefore load-bearing.

minor comments (2)

[Optimization algorithm section] The description of the backtracking line-search strategy would benefit from explicit step-size parameters, Armijo constants, and a brief convergence-rate statement.
[Experimental results section] Numerical experiments are summarized as corroborating the analysis, but the manuscript should include error bars, explicit data-exclusion criteria, and a table of baseline hyper-parameters to allow independent verification.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and for identifying a key point regarding the theoretical support for our robustness claims. We address this comment in detail below and outline the revisions we will make.

read point-by-point responses

Referee: The central claim that SCQM enables reliable performance under heavy-tailed noise (generalized Gaussian with p<2 or radial Laplace) rests on the unadjusted SQMF quadratic factorization remaining faithful and identifiable. No new recovery bounds, strict-convexity arguments, or identifiability conditions are supplied for regimes where second moments may fail to exist or the objective loses unique minimizers. This directly affects the robustness guarantee asserted in the abstract and is therefore load-bearing.

Authors: We agree that the manuscript does not supply new recovery bounds, strict-convexity arguments, or identifiability conditions for the SCQM under heavy-tailed regimes where second moments may not exist. The formulation extends SQMF by adopting loss functions (ℓ_p^p for generalized Gaussian and the radial Laplace loss) that are known to be robust without requiring finite variance, and the sensitivity analysis section examines the distinct behavior of these losses compared to ℓ_2. The primary evidence for reliable performance is therefore empirical, as shown in the numerical experiments where the method outperforms baselines under controlled heavy-tailed noise. We will revise the abstract to state that the model accommodates such noise distributions and demonstrates improved robustness through experiments and sensitivity analysis, rather than asserting new theoretical guarantees. We will also add a brief paragraph in the discussion section acknowledging the absence of new identifiability results for these noise models and identifying it as an important direction for future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation builds on external SQMF framework with independent generalization and experiments

full rationale

The provided abstract and context show the SCQM model explicitly builds upon the prior SQMF framework and extends it to generalized Gaussian and radial Laplace noise models via new loss functions. No equations or steps are quoted that reduce predictions or identifiability claims back to fitted inputs by construction. Numerical experiments are invoked to corroborate results, but without supplied details indicating reuse of the same parameters or loss definitions in a self-referential loop. The central claims rest on the proposed gradient algorithm and sensitivity analysis, which are presented as new contributions rather than tautological renamings or self-citation chains. This qualifies as a self-contained derivation against external benchmarks, consistent with the most common honest finding for such papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Because only the abstract is available, the ledger is necessarily incomplete. The central claim rests on the unstated premise that high-dimensional observations admit a low-dimensional quadratic subspace representation once noise is modeled appropriately. No free parameters, invented entities, or explicit axioms are named in the provided text.

pith-pipeline@v0.9.0 · 5675 in / 1177 out tokens · 28403 ms · 2026-05-21T07:35:00.834676+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We generalize classical quadratic matrix factorization beyond the Frobenius-norm objective by allowing a broad class of loss functions... ℓ(r)=∥r∥_p^p ... for generalized Gaussian... radial Laplace
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 1 (Local convexity radius for ℓ_p^p-SCQM)... Hessian of ℓ(τ) is positive semidefinite throughout the p−1 norm ball

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

[1]

Dimension reduction by local principal component analysis.Neural Computation, 9:1493–1516, 10 1997

Nanda Kambhatla and Todd Leen. Dimension reduction by local principal component analysis.Neural Computation, 9:1493–1516, 10 1997

work page 1997
[2]

Nonparametric ridge estimation.The Annals of Statistics, 42(4):1511–1545, 2014

Christopher R Genovese, Marco Perone-Pacifico, Isabella Verdinelli, Larry Wasserman, et al. Nonparametric ridge estimation.The Annals of Statistics, 42(4):1511–1545, 2014

work page 2014
[3]

Fitting a putative manifold to noisy data

Charles Fefferman, Sergei Ivanov, Yaroslav Kurylev, Matti Lassas, and Hariharan Narayanan. Fitting a putative manifold to noisy data. In Conference On Learning Theory, pages 688–720, 2018

work page 2018
[4]

Locally defined principal curves and surfaces.Journal of Machine learning research, 12(Apr):1249– 1286, 2011

Umut Ozertem and Deniz Erdogmus. Locally defined principal curves and surfaces.Journal of Machine learning research, 12(Apr):1249– 1286, 2011

work page 2011
[5]

Quadratic matrix factor- ization with applications to manifold learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(9):6384–6401, 2024

Zheng Zhai, Hengchao Chen, and Qiang Sun. Quadratic matrix factor- ization with applications to manifold learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(9):6384–6401, 2024

work page 2024
[6]

Subspace-constrained quadratic ma- trix factorization: Algorithm and applications.Pattern Recognition, 161:111333, 2025

Zheng Zhai and Xiaohui Li. Subspace-constrained quadratic ma- trix factorization: Algorithm and applications.Pattern Recognition, 161:111333, 2025

work page 2025
[7]

Robust subspace segmenta- tion by low-rank representation

Guangcan Liu, Zhouchen Lin, and Yong Yu. Robust subspace segmenta- tion by low-rank representation. InProceedings of the 27th international conference on machine learning (ICML-10), pages 663–670, 2010

work page 2010
[8]

Low-rank-sparse subspace representation for robust regression

Yongqiang Zhang, Daming Shi, Junbin Gao, and Dansong Cheng. Low-rank-sparse subspace representation for robust regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7445–7454, 2017

work page 2017
[9]

Genovese, Marco Perone-Pacifico, Isabella Verdinelli, and Larry Wasserman

Christopher R. Genovese, Marco Perone-Pacifico, Isabella Verdinelli, and Larry Wasserman. Nonparametric ridge estimation.Annals of Statistics, 42(4):1511–1545, 2014

work page 2014
[10]

Locally defined principal curves and surfaces.Journal of Machine Learning Research, 12:1249–1286, 2011

Umut Ozertem and Deniz Erdogmus. Locally defined principal curves and surfaces.Journal of Machine Learning Research, 12:1249–1286, 2011

work page 2011
[11]

Fitting a putative manifold to noisy data

Charles Fefferman, Sergei Ivanov, Yaroslav Kurylev, Matti Lassas, and Hariharan Narayanan. Fitting a putative manifold to noisy data. In Conference on Learning Theory, pages 688–720, 2018

work page 2018
[12]

Manifold approximation by moving least- squares projection.Constructive Approximation, 52(3):433–478, 2020

Barak Sober and David Levin. Manifold approximation by moving least- squares projection.Constructive Approximation, 52(3):433–478, 2020

work page 2020
[13]

Power transformed density ridge estimation.IEEE Signal Processing Letters, 2025

Hengchao Chen and Zheng Zhai. Power transformed density ridge estimation.IEEE Signal Processing Letters, 2025

work page 2025
[14]

Estimation of parameters for generalized gaussian distribution

Alexey A Roenko, Vladimir V Lukin, I Djurovi ´c, and M Simeunovi ´c. Estimation of parameters for generalized gaussian distribution. In2014 6th International Symposium on Communications, Control and Signal Processing (ISCCSP), pages 376–379. IEEE, 2014

work page 2014
[15]

Parameter estimation for multivariate generalized gaussian distributions.IEEE Transactions on Signal Processing, 61(23):5960– 5971, 2013

Fr ´ed´eric Pascal, Lionel Bombrun, Jean-Yves Tourneret, and Yannick Berthoumieu. Parameter estimation for multivariate generalized gaussian distributions.IEEE Transactions on Signal Processing, 61(23):5960– 5971, 2013. JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 14

work page 2013
[16]

Wavelet-based texture retrieval using generalized gaussian density and kullback-leibler distance.IEEE transactions on image processing, 11(2):146–158, 2002

Minh N Do and Martin Vetterli. Wavelet-based texture retrieval using generalized gaussian density and kullback-leibler distance.IEEE transactions on image processing, 11(2):146–158, 2002

work page 2002
[17]

Springer Science & Business Media, 2012

Samuel Kotz, Tomasz Kozubowski, and Krzystof Podgorski.The Laplace distribution and generalizations: a revisit with applications to communications, economics, engineering, and finance. Springer Science & Business Media, 2012

work page 2012
[18]

SIAM, 2022

Gilbert Strang.Introduction to linear algebra. SIAM, 2022

work page 2022
[19]

Princeton University Press, 2008

P-A Absil, Robert Mahony, and Rodolphe Sepulchre.Optimization algorithms on matrix manifolds. Princeton University Press, 2008

work page 2008
[20]

Cambridge University Press, 2012

Abhishek Bhattacharya and Rabi Bhattacharya.Nonparametric inference on manifolds: with applications to shape spaces, volume 2. Cambridge University Press, 2012

work page 2012
[21]

Krantz and Harold R

Steven G. Krantz and Harold R. Parks.The Implicit Function Theorem: History, Theory, and Applications. Birkh ¨auser, 2013

work page 2013
[22]

Golub and Charles F

Gene H. Golub and Charles F. Van Loan.Matrix Computations. Johns Hopkins University Press, 4th edition, 2013

work page 2013
[23]

Numerical optimization.Springer Ser

Jorge Nocedal. Numerical optimization.Springer Ser. Oper. Res. Financ. Eng./Springer, 2006

work page 2006
[24]

Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002

Yann LeCun, L ´eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002

work page 2002
[25]

Cambridge university press, 2019

Martin J Wainwright.High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019

work page 2019

[1] [1]

Dimension reduction by local principal component analysis.Neural Computation, 9:1493–1516, 10 1997

Nanda Kambhatla and Todd Leen. Dimension reduction by local principal component analysis.Neural Computation, 9:1493–1516, 10 1997

work page 1997

[2] [2]

Nonparametric ridge estimation.The Annals of Statistics, 42(4):1511–1545, 2014

Christopher R Genovese, Marco Perone-Pacifico, Isabella Verdinelli, Larry Wasserman, et al. Nonparametric ridge estimation.The Annals of Statistics, 42(4):1511–1545, 2014

work page 2014

[3] [3]

Fitting a putative manifold to noisy data

Charles Fefferman, Sergei Ivanov, Yaroslav Kurylev, Matti Lassas, and Hariharan Narayanan. Fitting a putative manifold to noisy data. In Conference On Learning Theory, pages 688–720, 2018

work page 2018

[4] [4]

Locally defined principal curves and surfaces.Journal of Machine learning research, 12(Apr):1249– 1286, 2011

Umut Ozertem and Deniz Erdogmus. Locally defined principal curves and surfaces.Journal of Machine learning research, 12(Apr):1249– 1286, 2011

work page 2011

[5] [5]

Quadratic matrix factor- ization with applications to manifold learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(9):6384–6401, 2024

Zheng Zhai, Hengchao Chen, and Qiang Sun. Quadratic matrix factor- ization with applications to manifold learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(9):6384–6401, 2024

work page 2024

[6] [6]

Subspace-constrained quadratic ma- trix factorization: Algorithm and applications.Pattern Recognition, 161:111333, 2025

Zheng Zhai and Xiaohui Li. Subspace-constrained quadratic ma- trix factorization: Algorithm and applications.Pattern Recognition, 161:111333, 2025

work page 2025

[7] [7]

Robust subspace segmenta- tion by low-rank representation

Guangcan Liu, Zhouchen Lin, and Yong Yu. Robust subspace segmenta- tion by low-rank representation. InProceedings of the 27th international conference on machine learning (ICML-10), pages 663–670, 2010

work page 2010

[8] [8]

Low-rank-sparse subspace representation for robust regression

Yongqiang Zhang, Daming Shi, Junbin Gao, and Dansong Cheng. Low-rank-sparse subspace representation for robust regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7445–7454, 2017

work page 2017

[9] [9]

Genovese, Marco Perone-Pacifico, Isabella Verdinelli, and Larry Wasserman

Christopher R. Genovese, Marco Perone-Pacifico, Isabella Verdinelli, and Larry Wasserman. Nonparametric ridge estimation.Annals of Statistics, 42(4):1511–1545, 2014

work page 2014

[10] [10]

Locally defined principal curves and surfaces.Journal of Machine Learning Research, 12:1249–1286, 2011

Umut Ozertem and Deniz Erdogmus. Locally defined principal curves and surfaces.Journal of Machine Learning Research, 12:1249–1286, 2011

work page 2011

[11] [11]

Fitting a putative manifold to noisy data

Charles Fefferman, Sergei Ivanov, Yaroslav Kurylev, Matti Lassas, and Hariharan Narayanan. Fitting a putative manifold to noisy data. In Conference on Learning Theory, pages 688–720, 2018

work page 2018

[12] [12]

Manifold approximation by moving least- squares projection.Constructive Approximation, 52(3):433–478, 2020

Barak Sober and David Levin. Manifold approximation by moving least- squares projection.Constructive Approximation, 52(3):433–478, 2020

work page 2020

[13] [13]

Power transformed density ridge estimation.IEEE Signal Processing Letters, 2025

Hengchao Chen and Zheng Zhai. Power transformed density ridge estimation.IEEE Signal Processing Letters, 2025

work page 2025

[14] [14]

Estimation of parameters for generalized gaussian distribution

Alexey A Roenko, Vladimir V Lukin, I Djurovi ´c, and M Simeunovi ´c. Estimation of parameters for generalized gaussian distribution. In2014 6th International Symposium on Communications, Control and Signal Processing (ISCCSP), pages 376–379. IEEE, 2014

work page 2014

[15] [15]

Parameter estimation for multivariate generalized gaussian distributions.IEEE Transactions on Signal Processing, 61(23):5960– 5971, 2013

Fr ´ed´eric Pascal, Lionel Bombrun, Jean-Yves Tourneret, and Yannick Berthoumieu. Parameter estimation for multivariate generalized gaussian distributions.IEEE Transactions on Signal Processing, 61(23):5960– 5971, 2013. JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 14

work page 2013

[16] [16]

Wavelet-based texture retrieval using generalized gaussian density and kullback-leibler distance.IEEE transactions on image processing, 11(2):146–158, 2002

Minh N Do and Martin Vetterli. Wavelet-based texture retrieval using generalized gaussian density and kullback-leibler distance.IEEE transactions on image processing, 11(2):146–158, 2002

work page 2002

[17] [17]

Springer Science & Business Media, 2012

Samuel Kotz, Tomasz Kozubowski, and Krzystof Podgorski.The Laplace distribution and generalizations: a revisit with applications to communications, economics, engineering, and finance. Springer Science & Business Media, 2012

work page 2012

[18] [18]

SIAM, 2022

Gilbert Strang.Introduction to linear algebra. SIAM, 2022

work page 2022

[19] [19]

Princeton University Press, 2008

P-A Absil, Robert Mahony, and Rodolphe Sepulchre.Optimization algorithms on matrix manifolds. Princeton University Press, 2008

work page 2008

[20] [20]

Cambridge University Press, 2012

Abhishek Bhattacharya and Rabi Bhattacharya.Nonparametric inference on manifolds: with applications to shape spaces, volume 2. Cambridge University Press, 2012

work page 2012

[21] [21]

Krantz and Harold R

Steven G. Krantz and Harold R. Parks.The Implicit Function Theorem: History, Theory, and Applications. Birkh ¨auser, 2013

work page 2013

[22] [22]

Golub and Charles F

Gene H. Golub and Charles F. Van Loan.Matrix Computations. Johns Hopkins University Press, 4th edition, 2013

work page 2013

[23] [23]

Numerical optimization.Springer Ser

Jorge Nocedal. Numerical optimization.Springer Ser. Oper. Res. Financ. Eng./Springer, 2006

work page 2006

[24] [24]

Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002

Yann LeCun, L ´eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002

work page 2002

[25] [25]

Cambridge university press, 2019

Martin J Wainwright.High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019

work page 2019