Deep Learning for Subspace Regression
Pith reviewed 2026-05-18 12:03 UTC · model grok-4.3
The pith
Predicting larger-than-required subspaces with neural networks simplifies the mapping and improves accuracy in parametric subspace regression.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Predicting oversized subspaces instead of exact-dimension ones decreases the complexity of the mapping for elliptic eigenproblems with constant coefficients and makes the mapping smoother for general smooth functions on the Grassmann manifold, resulting in improved neural network approximation accuracy for subspace regression.
What carries the argument
The redundancy mechanism of predicting a subspace of dimension larger than the target one, which reduces the complexity of the target function on the Grassmann manifold.
If this is right
- Accuracy significantly improves when larger-than-required subspaces are predicted in empirical tests.
- The strategy decreases the complexity of the mapping for elliptic eigenproblems with constant coefficients.
- The mapping becomes smoother for general smooth functions on the Grassmann manifold.
- Subspace regression applies successfully to parametric eigenproblems, deflation techniques, relaxation methods, optimal control, and parametric partial differential equations.
Where Pith is reading between the lines
- This redundancy approach might generalize to other regression tasks on manifolds where exact low-dimensional targets are hard to learn.
- It could reduce the computational cost of offline stages in reduced-order modeling by enabling more reliable online approximations.
- Extensions to non-smooth or discontinuous subspace dependencies may require additional regularization techniques.
Load-bearing premise
The mapping from parameters to subspaces is sufficiently smooth or can be well-approximated by a neural network when extra redundancy is added.
What would settle it
A numerical test on an elliptic eigenproblem with constant coefficients showing no improvement or even degradation in accuracy when the predicted subspace dimension is increased beyond the required size.
Figures
read the original abstract
It is often possible to perform reduced order modelling by specifying linear subspace which accurately captures the dynamics of the system. This approach becomes especially appealing when linear subspace explicitly depends on parameters of the problem. A practical way to apply such a scheme is to compute subspaces for a selected set of parameters in the computationally demanding offline stage and in the online stage approximate subspace for unknown parameters by interpolation. For realistic problems the space of parameters is high dimensional, which renders classical interpolation strategies infeasible or unreliable. We propose to relax the interpolation problem to regression, introduce several loss functions suitable for subspace data, and use a neural network as an approximation to high-dimensional target function. To further simplify a learning problem we introduce redundancy: in place of predicting subspace of a given dimension we predict larger subspace. We show theoretically that this strategy decreases the complexity of the mapping for elliptic eigenproblems with constant coefficients and makes the mapping smoother for general smooth function on the Grassmann manifold. Empirical results also show that accuracy significantly improves when larger-than-required subspaces are predicted. With the set of numerical illustrations we demonstrate that subspace regression can be useful for a range of tasks including parametric eigenproblems, deflation techniques, relaxation methods, optimal control and solution of parametric partial differential equations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes relaxing subspace interpolation to regression using neural networks for parameter-dependent linear subspaces in reduced-order modeling. It introduces redundancy by predicting oversized subspaces (dimension r+m instead of r) to simplify the target mapping, provides a theoretical argument that this decreases complexity for constant-coefficient elliptic eigenproblems and smooths the map on the Grassmann manifold for general smooth targets, and reports empirical accuracy gains across tasks including parametric eigenproblems, deflation, relaxation methods, optimal control, and parametric PDEs.
Significance. If the redundancy-based smoothing and accuracy claims hold with rigorous quantification, the work offers a practical route to neural-network regression for high-dimensional parametric reduced-order models, extending beyond classical interpolation. The explicit theoretical treatment for the constant-coefficient case and the reproducible empirical illustrations are strengths that could influence scientific machine-learning practice.
major comments (2)
- [Theoretical analysis] Theoretical section on redundancy: the claim that predicting larger subspaces 'makes the mapping smoother for general smooth function on the Grassmann manifold' is stated without a precise metric (e.g., reduction in Lipschitz constant, Sobolev norm, or covering number) or a proof that the improvement is sufficient to offset the high-dimensional parameter space; this is load-bearing for both the theoretical justification and the interpretation of the reported accuracy gains.
- [Numerical illustrations] Empirical evaluation: the abstract and results assert 'accuracy significantly improves' with larger subspaces, yet no error bars, dataset sizes, or explicit baseline comparisons (e.g., against direct r-dimensional regression or classical interpolation) are visible in the provided description, undermining verification of the central empirical claim.
minor comments (2)
- [Abstract] The abstract refers to 'several loss functions suitable for subspace data' without naming them or pointing to the defining equations; adding a short list or reference would improve readability.
- [Introduction / Theory] Notation for the Grassmann manifold and the redundancy parameter m should be introduced consistently in the first theoretical paragraph to avoid later ambiguity.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive report. The comments help clarify how to strengthen the presentation of both the theoretical justification for redundancy and the empirical results. We address each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Theoretical analysis] Theoretical section on redundancy: the claim that predicting larger subspaces 'makes the mapping smoother for general smooth function on the Grassmann manifold' is stated without a precise metric (e.g., reduction in Lipschitz constant, Sobolev norm, or covering number) or a proof that the improvement is sufficient to offset the high-dimensional parameter space; this is load-bearing for both the theoretical justification and the interpretation of the reported accuracy gains.
Authors: We agree that a more quantitative treatment would strengthen the general case. The manuscript currently shows an explicit complexity reduction for constant-coefficient elliptic eigenproblems and gives a geometric argument that extra dimensions on the Grassmann manifold permit a smoother parametrization for general smooth targets. We will revise the theoretical section to introduce a concrete metric (e.g., a bound on the Lipschitz constant of the composed map) and a short argument showing that the reduction in variation scales favorably with the added redundancy even in high-dimensional parameter spaces. This will be supported by a brief proof sketch. revision: yes
-
Referee: [Numerical illustrations] Empirical evaluation: the abstract and results assert 'accuracy significantly improves' with larger subspaces, yet no error bars, dataset sizes, or explicit baseline comparisons (e.g., against direct r-dimensional regression or classical interpolation) are visible in the provided description, undermining verification of the central empirical claim.
Authors: We acknowledge that the summary provided to the referee did not make these details sufficiently prominent. The full manuscript already contains dataset sizes, baseline comparisons against both direct r-dimensional regression and classical interpolation, and error bars on the reported figures. To address the concern directly, we will revise the abstract, the results section, and the figure captions to state these elements explicitly, including quantitative improvement factors and the number of independent runs used for the error bars. revision: partial
Circularity Check
No significant circularity; derivation is self-contained with independent theoretical analysis.
full rationale
The paper presents a new regression-based approach to subspace approximation via neural networks, with redundancy introduced to simplify the learning task. The key theoretical claim—that redundancy reduces mapping complexity for constant-coefficient elliptic eigenproblems and smooths the map on the Grassmann manifold—is stated as a derived result from analysis of the problem structure, not as a re-expression of fitted inputs or prior self-citations. No equations or steps reduce by construction to the inputs (e.g., no fitted parameter renamed as prediction, no ansatz smuggled via self-citation, no uniqueness theorem imported from overlapping authors). Empirical accuracy gains are reported separately from the theory. The derivation chain remains independent of the target result itself.
Axiom & Free-Parameter Ledger
free parameters (2)
- oversized subspace dimension
- neural network hyperparameters
axioms (2)
- domain assumption The mapping from parameters to subspaces is smooth on the Grassmann manifold
- standard math Standard properties of elliptic eigenproblems with constant coefficients
Forward citations
Cited by 1 Pith paper
-
Mode-realigned pointwise interpolation (MRPWI) for efficient POD-Galerkin parametric reduced-order models
MRPWI synchronizes POD modes via sign and rotation alignment to enable fast, accurate pointwise interpolation for parametric POD-Galerkin reduced-order models.
Reference graph
Works this paper leans on
-
[1]
David Amsallem.Interpolation on manifolds of CFD-based fluid and finite element-based structural reduced-order models for on-line aeroelastic predictions. Stanford University, 2010
work page 2010
-
[2]
Haim Avron, Petar Maymounkov, and Sivan Toledo. Blendenpik: Supercharging lapack’s least-squares solver.SIAM Journal on Scientific Computing, 32(3):1217–1236, 2010
work page 2010
-
[3]
Reduced-order modeling.Handbook of numerical analysis, 13:825–895, 2005
Zhaojun Bai, Patrick M Dewilde, and Roland W Freund. Reduced-order modeling.Handbook of numerical analysis, 13:825–895, 2005
work page 2005
-
[4]
Pau Batlle, Matthieu Darcy, Bamdad Hosseini, and Houman Owhadi. Kernel methods are competitive for operator learning.Journal of Computational Physics, 496:112549, 2024
work page 2024
-
[5]
Thomas Bendokat, Ralf Zimmermann, and P-A Absil. A grassmann manifold handbook: Basic geometry and computational aspects.Advances in Computational Mathematics, 50(1):6, 2024
work page 2024
-
[6]
Kaushik Bhattacharya, Bamdad Hosseini, Nikola B Kovachki, and Andrew M Stuart. Model reduction and neural networks for parametric pdes.The SMAI journal of computational mathematics, 7:121–157, 2021
work page 2021
-
[7]
Christopher M Bishop and Nasser M Nasrabadi.Pattern recognition and machine learning, volume 4. Springer, 2006
work page 2006
-
[8]
Numerical methods for computing angles between linear subspaces
Ake Bj¨ orck and Gene H Golub. Numerical methods for computing angles between linear subspaces. Mathematics of computation, 27(123):579–594, 1973. 10
work page 1973
-
[9]
Barry K Carpenter, Gregory S Ezra, Stavros C Farantos, Zeb C Kramer, and Stephen Wiggins. Dy- namics on the double morse potential: a paradigm for roaming reactions with no saddle points.Regular and Chaotic Dynamics, 23(1):60–79, 2018
work page 2018
-
[10]
Xiangning Chen, Chen Liang, Da Huang, Esteban Real, Kaiyuan Wang, Hieu Pham, Xuanyi Dong, Thang Luong, Cho-Jui Hsieh, Yifeng Lu, et al. Symbolic discovery of optimization algorithms.Advances in neural information processing systems, 36:49205–49233, 2023
work page 2023
-
[11]
A gentle introduction to interpolation on the grassmann manifold
Gabriele Ciaramella, Martin J Gander, and Tommaso Vanzan. A gentle introduction to interpolation on the grassmann manifold. 2025
work page 2025
-
[12]
Nicola Rares Franco, Andrea Manzoni, Paolo Zunino, and Jan S Hesthaven. Deep orthogonal decom- position: a continuously adaptive neural network approach to model order reduction of parametrized partial differential equations
-
[13]
Nicola Rares Franco, Andrea Manzoni, Paolo Zunino, and Jan S Hesthaven. Deep orthogonal de- composition: a continuously adaptive data-driven approach to model order reduction.arXiv preprint arXiv:2404.18841, 2024
-
[14]
A randomized preconditioned cholesky-qr algorithm.arXiv preprint arXiv:2406.11751, 2024
James E Garrison and Ilse CF Ipsen. A randomized preconditioned cholesky-qr algorithm.arXiv preprint arXiv:2406.11751, 2024
-
[15]
Methods of conjugate gradients for solving linear systems
Magnus R Hestenes, Eduard Stiefel, et al. Methods of conjugate gradients for solving linear systems. Journal of research of the National Bureau of Standards, 49(6):409–436, 1952
work page 1952
-
[16]
Reduced basis methods for time-dependent problems.Acta Numerica, 31:265–345, 2022
Jan S Hesthaven, Cecilia Pagliantini, and Gianluigi Rozza. Reduced basis methods for time-dependent problems.Acta Numerica, 31:265–345, 2022
work page 2022
-
[17]
Jan S Hesthaven and Stefano Ubbiali. Non-intrusive reduced order modeling of nonlinear problems using neural networks.Journal of Computational Physics, 363:55–78, 2018
work page 2018
-
[18]
Donald E Kirk.Optimal control theory: an introduction. Courier Corporation, 2004
work page 2004
-
[19]
Isaac E Lagaris, Aristidis Likas, and Dimitrios I Fotiadis. Artificial neural networks for solving ordinary and partial differential equations.IEEE transactions on neural networks, 9(5):987–1000, 1998
work page 1998
-
[20]
Robert J Le Roy, Yiye Huang, and Calvin Jary. An accurate analytic potential function for ground-state n2 from a direct-potential-fit analysis of spectroscopic data.The Journal of chemical physics, 125(16), 2006
work page 2006
-
[21]
Fourier Neural Operator for Parametric Partial Differential Equations
Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[22]
Lu Lu, Pengzhan Jin, and George Em Karniadakis. Deeponet: Learning nonlinear operators for identi- fying differential equations based on the universal approximation theorem of operators.arXiv preprint arXiv:1910.03193, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[23]
Brek Meuris, Saad Qadeer, and Panos Stinis. Machine-learning custom-made basis functions for partial differential equations.arXiv preprint arXiv:2111.05307, 2021
-
[24]
Brek Meuris, Saad Qadeer, and Panos Stinis. Machine-learning-based spectral methods for partial differential equations.Scientific Reports, 13(1):1739, 2023
work page 2023
-
[25]
Bruce Moore. Principal component analysis in linear systems: Controllability, observability, and model reduction.IEEE transactions on automatic control, 26(1):17–32, 2003. 11
work page 2003
-
[26]
Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational physics, 378:686–707, 2019
work page 2019
- [27]
-
[28]
Yousef Saad.Numerical methods for large eigenvalue problems: revised edition. SIAM, 2011
work page 2011
-
[29]
Yousef Saad, Manshung Yeung, Jocelyne Erhel, and Fr´ ed´ eric Guyomarc’h. A deflated version of the conjugate gradient algorithm.SIAM Journal on Scientific Computing, 21(5):1909–1926, 2000
work page 1909
-
[30]
Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. Implicit neural representations with periodic activation functions.Advances in neural information processing systems, 33:7462–7473, 2020
work page 2020
-
[31]
Factorized fourier neural oper- ators.arXiv preprint arXiv:2111.13802, 2021
Alasdair Tran, Alexander Mathews, Lexing Xie, and Cheng Soon Ong. Factorized fourier neural oper- ators.arXiv preprint arXiv:2111.13802, 2021
- [32]
-
[33]
Ulrich Trottenberg, Cornelius W Oosterlee, and Anton Schuller.Multigrid methods. Academic press, 2001
work page 2001
-
[34]
Stefan Volkwein. Proper orthogonal decomposition: Theory and reduced-order modelling.Lecture Notes, University of Konstanz, 4(4):1–29, 2013
work page 2013
-
[35]
David P Woodruff et al. Sketching as a tool for numerical linear algebra.Foundations and Trends®in Theoretical Computer Science, 10(1–2):1–157, 2014
work page 2014
-
[36]
Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, Yanyang Xiao, and Zheng Ma. Frequency principle: Fourier analysis sheds light on deep neural networks.arXiv preprint arXiv:1901.06523, 2019
-
[37]
Roundoff error analysis of the choleskyqr2 algorithm.Electron
Yusaku Yamamoto, Yuji Nakatsukasa, Yuka Yanagisawa, and Takeshi Fukaya. Roundoff error analysis of the choleskyqr2 algorithm.Electron. Trans. Numer. Anal, 44(01):306–326, 2015
work page 2015
-
[38]
Manifold interpolation and model reduction.arXiv preprint arXiv:1902.06502, 2019
Ralf Zimmermann. Manifold interpolation and model reduction.arXiv preprint arXiv:1902.06502, 2019. A Proof of Theorem 1
-
[39]
To show thatL 1(A, B) does not depend on the chosen representative we observe that L1(A, B) =p− Q⊤ BQA 2 F = 1 2 ∥PB −P A∥2 F − k−p 2 ,(8) whereP A =A A⊤A −1 A⊤, PB =B B⊤B −1 B⊤ are orthogonal projectors on the columns spaces ofAandB. When QR decompositionsA=Q ARA, B=Q BRB are available, projectors become PA =Q AQ⊤ A, PB =Q BQ⊤ B and identity (8) can be v...
-
[40]
We first show that L2(A, B;z) = min u ∥Au−Q Bz∥2 2 =∥(I−P A)Q Bz∥2 2 ,(12) whereP A =A A⊤A −1 A⊤ is orthogonal projector on the columns space ofA. UsingI= (I−P A) + PA, andA(I−P A) = (I−P A)A= 0 we obtain min u ∥Au−Q Bz∥2 2 = min u ∥Au−P AQBz−(I−P A)Q Bz∥2 2 = min u ∥Au−P AQBz∥2 2 +∥(I−P A)Q Bz∥2 2 =∥(I−P A)Q Bz∥2 2 .(13) The last equality holds sinceP AQ...
-
[41]
From equation (12) we find Ez [L2(A, B;z)] =E z h ∥(I−P A)Q Bz∥2 2 i =E z z⊤ Q⊤ B (I−P A)Q B z =E z tr Q⊤ BQB −Q ⊤ BQAQ⊤ AQ⊤ B zz ⊤ = tr Q⊤ BQB −Q ⊤ BQAQ⊤ AQ⊤ B Ez zz ⊤ =p− Q⊤ BQA 2 F =L 1(A, B).(14) B Proof of Theorem 2 We provide two comments before proceeding with the proof. In most parts of the text we assumed working with the non-compact Stiefel mani...
- [42]
-
[43]
Gradually increasecand track ellipsoid PD j=1 ajz2 j =c
-
[44]
While increasingcadd each standard positive lattice point (point with positive integer coordinates) that fall inside the ellipsoids
-
[45]
To illustrate this process, considerE(z 1,
The order at which lattice points cross an inflating ellipsoid define which eigenvector appears on position kand which vectors form eigenspace of dimensionk. To illustrate this process, considerE(z 1, . . . , zD) =a 1z2 1 +a 2z2 2, wherea 2 ≫a 1. If we follow procedure outlined above we will see that first lattice points encountered are (1,1),(2,1),(3,1),...
-
[46]
Gaussian random fieldψis generated fromN(0,(id−γ∆) r),γ= 1 20π ,r= 1 2
-
[47]
For oneD= 2 datasetk 1 =k 2 and for anotherk 1 ̸=k 2 but both are i.i.d
Diffusion coefficient is computed asa=α+ (β−α) (tanh (sψ) + 1)/2 withα= 1,β= 50,s= 1. For oneD= 2 datasetk 1 =k 2 and for anotherk 1 ̸=k 2 but both are i.i.d. random fields generated as described above. In the main text only results fork 1 =k 2 are reported. ForD= 3 elliptic eigenproblem we use setup analogous toD= 2 but grid of size 30×30×30 and k1 =k 2 ...
-
[48]
Fourier coefficients on a square index setK={0,
Draw i.i.d. Fourier coefficients on a square index setK={0, . . . , M−1} 2 and form a real field by summing complex exponentials. We additionally introduce a Fourier-space weightw k = (1 +λ1|k|2 2)−1 to control the high-frequency components
-
[49]
Multiply the real field from the previous step byλ 2, then apply a hyperbolic tangent function to control the contrast of the coefficient field values
-
[50]
Rescale the field to the prescribed interval [α, β] to ensure strict positivity and enforce a controlled contrast ratio ofβ/α. 25 Exact procedure to generate the 2D field is: s0(x, y) = Re X k∈{0,...,M−1} 2 ck ei(k1x+k2y) 1 +λ 1 ∥k∥2 2 , c k ∼ N(0,1), s(x, y) = tanh λ2 ·s 0(x, y) , k(x, y) =α+ (β−α) s(x, y) + 1 2 , k(x, y)∈[α, β]. Figure 9: Sample...
-
[51]
A value closer to 1 indicates better subspace alignment
Cosine angles between the true subspaceVand the predicted subspace, computed as the singular values ofQ ⊤ W V. A value closer to 1 indicates better subspace alignment
-
[52]
A smaller value indicates that the predicted subspace reconstructsV i more accurately
Relative reconstruction errore= min u ∥V−W u∥ 2 for each true basis vectorV, computed as∥(I− QW Q⊤ W )V∥ 2. A smaller value indicates that the predicted subspace reconstructsV i more accurately. 26 Figure 10: First five eigenvectors of the error propagation matrixI−ωD −1A. Top:ω= 1.0. Bottom: ω= 0.9
-
[53]
We estimateρvia the power method by repeatedly applyingTto a vector: vk+1 = T vk ∥T vk∥
Two-grid convergence rate, measured by the spectral radiusρof the two-grid iteration operatorT. We estimateρvia the power method by repeatedly applyingTto a vector: vk+1 = T vk ∥T vk∥ . A smaller spectral radius indicates faster asymptotic convergence. In Table 7, we report these metrics for the best-performing models and for the ground-truth target subsp...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.