Deep Learning for Subspace Regression

Alexander Rudikov; Ekaterina Muravleva; Ivan Oseledets; Vladimir Fanaskov; Vladislav Trifonov

arxiv: 2509.23249 · v4 · submitted 2025-09-27 · 💻 cs.LG · cs.NA· math.NA

Deep Learning for Subspace Regression

Vladimir Fanaskov , Vladislav Trifonov , Alexander Rudikov , Ekaterina Muravleva , Ivan Oseledets This is my paper

Pith reviewed 2026-05-18 12:03 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.NA

keywords subspace regressionneural networksreduced order modelingGrassmann manifoldparametric eigenproblemsredundancy strategydeep learningparametric PDEs

0 comments

The pith

Predicting larger-than-required subspaces with neural networks simplifies the mapping and improves accuracy in parametric subspace regression.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to use neural networks for regressing linear subspaces that depend on high-dimensional parameters in reduced-order modeling. Instead of classical interpolation, which fails in high dimensions, they relax the problem to regression with suitable loss functions for subspace data. A key innovation is introducing redundancy by predicting larger subspaces than needed, which theoretically decreases mapping complexity for elliptic eigenproblems with constant coefficients and smooths the mapping for smooth functions on the Grassmann manifold. This leads to significantly better empirical accuracy across several applications including parametric eigenproblems and PDE solutions.

Core claim

Predicting oversized subspaces instead of exact-dimension ones decreases the complexity of the mapping for elliptic eigenproblems with constant coefficients and makes the mapping smoother for general smooth functions on the Grassmann manifold, resulting in improved neural network approximation accuracy for subspace regression.

What carries the argument

The redundancy mechanism of predicting a subspace of dimension larger than the target one, which reduces the complexity of the target function on the Grassmann manifold.

If this is right

Accuracy significantly improves when larger-than-required subspaces are predicted in empirical tests.
The strategy decreases the complexity of the mapping for elliptic eigenproblems with constant coefficients.
The mapping becomes smoother for general smooth functions on the Grassmann manifold.
Subspace regression applies successfully to parametric eigenproblems, deflation techniques, relaxation methods, optimal control, and parametric partial differential equations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This redundancy approach might generalize to other regression tasks on manifolds where exact low-dimensional targets are hard to learn.
It could reduce the computational cost of offline stages in reduced-order modeling by enabling more reliable online approximations.
Extensions to non-smooth or discontinuous subspace dependencies may require additional regularization techniques.

Load-bearing premise

The mapping from parameters to subspaces is sufficiently smooth or can be well-approximated by a neural network when extra redundancy is added.

What would settle it

A numerical test on an elliptic eigenproblem with constant coefficients showing no improvement or even degradation in accuracy when the predicted subspace dimension is increased beyond the required size.

Figures

Figures reproduced from arXiv: 2509.23249 by Alexander Rudikov, Ekaterina Muravleva, Ivan Oseledets, Vladimir Fanaskov, Vladislav Trifonov.

**Figure 2.** Figure 2: Relative errors for selected baselines. Label “subspace” refers to subspace regression. For the [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Convergence results for iterative methods. Learned methods are marked with solid lines, and [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Example of subspace embedding detailed in Appendix C. [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: (a) Parametric family of ellipsoids passing through point (4 [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Illustration of simple greedy subspace embedding technique for elliptic eigenproblem with constant [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

**Figure 7.** Figure 7: Relative errors for two stationary diffusion equations depending on the number of basis functions [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗

**Figure 8.** Figure 8: Convergence results for deflated CG, elliptic dataset [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗

**Figure 9.** Figure 9: Sample coefficient function. The equation is descritized on a uniform grid with a 5-point finite-difference stencil, yielding a sparse, symmetric positivedefinite matrix. One can observe a sampled normalized coefficient function in [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

**Figure 10.** Figure 10: First five eigenvectors of the error propagation matrix [PITH_FULL_IMAGE:figures/full_fig_p027_10.png] view at source ↗

read the original abstract

It is often possible to perform reduced order modelling by specifying linear subspace which accurately captures the dynamics of the system. This approach becomes especially appealing when linear subspace explicitly depends on parameters of the problem. A practical way to apply such a scheme is to compute subspaces for a selected set of parameters in the computationally demanding offline stage and in the online stage approximate subspace for unknown parameters by interpolation. For realistic problems the space of parameters is high dimensional, which renders classical interpolation strategies infeasible or unreliable. We propose to relax the interpolation problem to regression, introduce several loss functions suitable for subspace data, and use a neural network as an approximation to high-dimensional target function. To further simplify a learning problem we introduce redundancy: in place of predicting subspace of a given dimension we predict larger subspace. We show theoretically that this strategy decreases the complexity of the mapping for elliptic eigenproblems with constant coefficients and makes the mapping smoother for general smooth function on the Grassmann manifold. Empirical results also show that accuracy significantly improves when larger-than-required subspaces are predicted. With the set of numerical illustrations we demonstrate that subspace regression can be useful for a range of tasks including parametric eigenproblems, deflation techniques, relaxation methods, optimal control and solution of parametric partial differential equations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Predicting larger-than-needed subspaces makes neural net regression on the Grassmann manifold easier for parametric reduced-order models, with a concrete argument for constant-coefficient elliptic eigenproblems and reported accuracy gains.

read the letter

The main point is that this paper replaces classical interpolation with neural net regression for parameter-dependent subspaces and adds the trick of deliberately predicting oversized subspaces to simplify the learning task. They back the oversized-subspace idea with a theoretical argument that it reduces mapping complexity for elliptic eigenproblems with constant coefficients and makes the target smoother on the Grassmann manifold in general, plus numerical examples showing better accuracy when the predicted dimension is inflated.

Referee Report

2 major / 2 minor

Summary. The paper proposes relaxing subspace interpolation to regression using neural networks for parameter-dependent linear subspaces in reduced-order modeling. It introduces redundancy by predicting oversized subspaces (dimension r+m instead of r) to simplify the target mapping, provides a theoretical argument that this decreases complexity for constant-coefficient elliptic eigenproblems and smooths the map on the Grassmann manifold for general smooth targets, and reports empirical accuracy gains across tasks including parametric eigenproblems, deflation, relaxation methods, optimal control, and parametric PDEs.

Significance. If the redundancy-based smoothing and accuracy claims hold with rigorous quantification, the work offers a practical route to neural-network regression for high-dimensional parametric reduced-order models, extending beyond classical interpolation. The explicit theoretical treatment for the constant-coefficient case and the reproducible empirical illustrations are strengths that could influence scientific machine-learning practice.

major comments (2)

[Theoretical analysis] Theoretical section on redundancy: the claim that predicting larger subspaces 'makes the mapping smoother for general smooth function on the Grassmann manifold' is stated without a precise metric (e.g., reduction in Lipschitz constant, Sobolev norm, or covering number) or a proof that the improvement is sufficient to offset the high-dimensional parameter space; this is load-bearing for both the theoretical justification and the interpretation of the reported accuracy gains.
[Numerical illustrations] Empirical evaluation: the abstract and results assert 'accuracy significantly improves' with larger subspaces, yet no error bars, dataset sizes, or explicit baseline comparisons (e.g., against direct r-dimensional regression or classical interpolation) are visible in the provided description, undermining verification of the central empirical claim.

minor comments (2)

[Abstract] The abstract refers to 'several loss functions suitable for subspace data' without naming them or pointing to the defining equations; adding a short list or reference would improve readability.
[Introduction / Theory] Notation for the Grassmann manifold and the redundancy parameter m should be introduced consistently in the first theoretical paragraph to avoid later ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. The comments help clarify how to strengthen the presentation of both the theoretical justification for redundancy and the empirical results. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Theoretical analysis] Theoretical section on redundancy: the claim that predicting larger subspaces 'makes the mapping smoother for general smooth function on the Grassmann manifold' is stated without a precise metric (e.g., reduction in Lipschitz constant, Sobolev norm, or covering number) or a proof that the improvement is sufficient to offset the high-dimensional parameter space; this is load-bearing for both the theoretical justification and the interpretation of the reported accuracy gains.

Authors: We agree that a more quantitative treatment would strengthen the general case. The manuscript currently shows an explicit complexity reduction for constant-coefficient elliptic eigenproblems and gives a geometric argument that extra dimensions on the Grassmann manifold permit a smoother parametrization for general smooth targets. We will revise the theoretical section to introduce a concrete metric (e.g., a bound on the Lipschitz constant of the composed map) and a short argument showing that the reduction in variation scales favorably with the added redundancy even in high-dimensional parameter spaces. This will be supported by a brief proof sketch. revision: yes
Referee: [Numerical illustrations] Empirical evaluation: the abstract and results assert 'accuracy significantly improves' with larger subspaces, yet no error bars, dataset sizes, or explicit baseline comparisons (e.g., against direct r-dimensional regression or classical interpolation) are visible in the provided description, undermining verification of the central empirical claim.

Authors: We acknowledge that the summary provided to the referee did not make these details sufficiently prominent. The full manuscript already contains dataset sizes, baseline comparisons against both direct r-dimensional regression and classical interpolation, and error bars on the reported figures. To address the concern directly, we will revise the abstract, the results section, and the figure captions to state these elements explicitly, including quantitative improvement factors and the number of independent runs used for the error bars. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained with independent theoretical analysis.

full rationale

The paper presents a new regression-based approach to subspace approximation via neural networks, with redundancy introduced to simplify the learning task. The key theoretical claim—that redundancy reduces mapping complexity for constant-coefficient elliptic eigenproblems and smooths the map on the Grassmann manifold—is stated as a derived result from analysis of the problem structure, not as a re-expression of fitted inputs or prior self-citations. No equations or steps reduce by construction to the inputs (e.g., no fitted parameter renamed as prediction, no ansatz smuggled via self-citation, no uniqueness theorem imported from overlapping authors). Empirical accuracy gains are reported separately from the theory. The derivation chain remains independent of the target result itself.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard properties of the Grassmann manifold and the existence of a sufficiently regular mapping from parameters to subspaces; no new physical entities or ad-hoc constants are introduced in the abstract.

free parameters (2)

oversized subspace dimension
Chosen larger than the target rank; its specific value is a modeling choice that affects both theory and empirical results.
neural network hyperparameters
Architecture depth, width, and training details are free choices not fixed by the problem statement.

axioms (2)

domain assumption The mapping from parameters to subspaces is smooth on the Grassmann manifold
Invoked to claim that redundancy makes the mapping smoother (abstract).
standard math Standard properties of elliptic eigenproblems with constant coefficients
Used for the complexity-reduction theoretical result.

pith-pipeline@v0.9.0 · 5759 in / 1385 out tokens · 28093 ms · 2026-05-18T12:03:50.180787+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Mode-realigned pointwise interpolation (MRPWI) for efficient POD-Galerkin parametric reduced-order models
math.NA 2026-04 unverdicted novelty 6.0

MRPWI synchronizes POD modes via sign and rotation alignment to enable fast, accurate pointwise interpolation for parametric POD-Galerkin reduced-order models.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Stanford University, 2010

David Amsallem.Interpolation on manifolds of CFD-based fluid and finite element-based structural reduced-order models for on-line aeroelastic predictions. Stanford University, 2010

work page 2010
[2]

Blendenpik: Supercharging lapack’s least-squares solver.SIAM Journal on Scientific Computing, 32(3):1217–1236, 2010

Haim Avron, Petar Maymounkov, and Sivan Toledo. Blendenpik: Supercharging lapack’s least-squares solver.SIAM Journal on Scientific Computing, 32(3):1217–1236, 2010

work page 2010
[3]

Reduced-order modeling.Handbook of numerical analysis, 13:825–895, 2005

Zhaojun Bai, Patrick M Dewilde, and Roland W Freund. Reduced-order modeling.Handbook of numerical analysis, 13:825–895, 2005

work page 2005
[4]

Kernel methods are competitive for operator learning.Journal of Computational Physics, 496:112549, 2024

Pau Batlle, Matthieu Darcy, Bamdad Hosseini, and Houman Owhadi. Kernel methods are competitive for operator learning.Journal of Computational Physics, 496:112549, 2024

work page 2024
[5]

A grassmann manifold handbook: Basic geometry and computational aspects.Advances in Computational Mathematics, 50(1):6, 2024

Thomas Bendokat, Ralf Zimmermann, and P-A Absil. A grassmann manifold handbook: Basic geometry and computational aspects.Advances in Computational Mathematics, 50(1):6, 2024

work page 2024
[6]

Model reduction and neural networks for parametric pdes.The SMAI journal of computational mathematics, 7:121–157, 2021

Kaushik Bhattacharya, Bamdad Hosseini, Nikola B Kovachki, and Andrew M Stuart. Model reduction and neural networks for parametric pdes.The SMAI journal of computational mathematics, 7:121–157, 2021

work page 2021
[7]

Springer, 2006

Christopher M Bishop and Nasser M Nasrabadi.Pattern recognition and machine learning, volume 4. Springer, 2006

work page 2006
[8]

Numerical methods for computing angles between linear subspaces

Ake Bj¨ orck and Gene H Golub. Numerical methods for computing angles between linear subspaces. Mathematics of computation, 27(123):579–594, 1973. 10

work page 1973
[9]

Dy- namics on the double morse potential: a paradigm for roaming reactions with no saddle points.Regular and Chaotic Dynamics, 23(1):60–79, 2018

Barry K Carpenter, Gregory S Ezra, Stavros C Farantos, Zeb C Kramer, and Stephen Wiggins. Dy- namics on the double morse potential: a paradigm for roaming reactions with no saddle points.Regular and Chaotic Dynamics, 23(1):60–79, 2018

work page 2018
[10]

Symbolic discovery of optimization algorithms.Advances in neural information processing systems, 36:49205–49233, 2023

Xiangning Chen, Chen Liang, Da Huang, Esteban Real, Kaiyuan Wang, Hieu Pham, Xuanyi Dong, Thang Luong, Cho-Jui Hsieh, Yifeng Lu, et al. Symbolic discovery of optimization algorithms.Advances in neural information processing systems, 36:49205–49233, 2023

work page 2023
[11]

A gentle introduction to interpolation on the grassmann manifold

Gabriele Ciaramella, Martin J Gander, and Tommaso Vanzan. A gentle introduction to interpolation on the grassmann manifold. 2025

work page 2025
[12]

Deep orthogonal decom- position: a continuously adaptive neural network approach to model order reduction of parametrized partial differential equations

Nicola Rares Franco, Andrea Manzoni, Paolo Zunino, and Jan S Hesthaven. Deep orthogonal decom- position: a continuously adaptive neural network approach to model order reduction of parametrized partial differential equations

work page
[13]

Deep orthogonal de- composition: a continuously adaptive data-driven approach to model order reduction.arXiv preprint arXiv:2404.18841, 2024

Nicola Rares Franco, Andrea Manzoni, Paolo Zunino, and Jan S Hesthaven. Deep orthogonal de- composition: a continuously adaptive data-driven approach to model order reduction.arXiv preprint arXiv:2404.18841, 2024

work page arXiv 2024
[14]

A randomized preconditioned cholesky-qr algorithm.arXiv preprint arXiv:2406.11751, 2024

James E Garrison and Ilse CF Ipsen. A randomized preconditioned cholesky-qr algorithm.arXiv preprint arXiv:2406.11751, 2024

work page arXiv 2024
[15]

Methods of conjugate gradients for solving linear systems

Magnus R Hestenes, Eduard Stiefel, et al. Methods of conjugate gradients for solving linear systems. Journal of research of the National Bureau of Standards, 49(6):409–436, 1952

work page 1952
[16]

Reduced basis methods for time-dependent problems.Acta Numerica, 31:265–345, 2022

Jan S Hesthaven, Cecilia Pagliantini, and Gianluigi Rozza. Reduced basis methods for time-dependent problems.Acta Numerica, 31:265–345, 2022

work page 2022
[17]

Non-intrusive reduced order modeling of nonlinear problems using neural networks.Journal of Computational Physics, 363:55–78, 2018

Jan S Hesthaven and Stefano Ubbiali. Non-intrusive reduced order modeling of nonlinear problems using neural networks.Journal of Computational Physics, 363:55–78, 2018

work page 2018
[18]

Courier Corporation, 2004

Donald E Kirk.Optimal control theory: an introduction. Courier Corporation, 2004

work page 2004
[19]

Artificial neural networks for solving ordinary and partial differential equations.IEEE transactions on neural networks, 9(5):987–1000, 1998

Isaac E Lagaris, Aristidis Likas, and Dimitrios I Fotiadis. Artificial neural networks for solving ordinary and partial differential equations.IEEE transactions on neural networks, 9(5):987–1000, 1998

work page 1998
[20]

An accurate analytic potential function for ground-state n2 from a direct-potential-fit analysis of spectroscopic data.The Journal of chemical physics, 125(16), 2006

Robert J Le Roy, Yiye Huang, and Calvin Jary. An accurate analytic potential function for ground-state n2 from a direct-potential-fit analysis of spectroscopic data.The Journal of chemical physics, 125(16), 2006

work page 2006
[21]

Fourier Neural Operator for Parametric Partial Differential Equations

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[22]

DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators

Lu Lu, Pengzhan Jin, and George Em Karniadakis. Deeponet: Learning nonlinear operators for identi- fying differential equations based on the universal approximation theorem of operators.arXiv preprint arXiv:1910.03193, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910
[23]

Machine-learning custom-made basis functions for partial differential equations.arXiv preprint arXiv:2111.05307, 2021

Brek Meuris, Saad Qadeer, and Panos Stinis. Machine-learning custom-made basis functions for partial differential equations.arXiv preprint arXiv:2111.05307, 2021

work page arXiv 2021
[24]

Machine-learning-based spectral methods for partial differential equations.Scientific Reports, 13(1):1739, 2023

Brek Meuris, Saad Qadeer, and Panos Stinis. Machine-learning-based spectral methods for partial differential equations.Scientific Reports, 13(1):1739, 2023

work page 2023
[25]

Principal component analysis in linear systems: Controllability, observability, and model reduction.IEEE transactions on automatic control, 26(1):17–32, 2003

Bruce Moore. Principal component analysis in linear systems: Controllability, observability, and model reduction.IEEE transactions on automatic control, 26(1):17–32, 2003. 11

work page 2003
[26]

Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational physics, 378:686–707, 2019

work page 2019
[27]

SIAM, 2003

Yousef Saad.Iterative methods for sparse linear systems. SIAM, 2003

work page 2003
[28]

SIAM, 2011

Yousef Saad.Numerical methods for large eigenvalue problems: revised edition. SIAM, 2011

work page 2011
[29]

A deflated version of the conjugate gradient algorithm.SIAM Journal on Scientific Computing, 21(5):1909–1926, 2000

Yousef Saad, Manshung Yeung, Jocelyne Erhel, and Fr´ ed´ eric Guyomarc’h. A deflated version of the conjugate gradient algorithm.SIAM Journal on Scientific Computing, 21(5):1909–1926, 2000

work page 1909
[30]

Implicit neural representations with periodic activation functions.Advances in neural information processing systems, 33:7462–7473, 2020

Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. Implicit neural representations with periodic activation functions.Advances in neural information processing systems, 33:7462–7473, 2020

work page 2020
[31]

Factorized fourier neural oper- ators.arXiv preprint arXiv:2111.13802, 2021

Alasdair Tran, Alexander Mathews, Lexing Xie, and Cheng Soon Ong. Factorized fourier neural oper- ators.arXiv preprint arXiv:2111.13802, 2021

work page arXiv 2021
[32]

SIAM, 2022

Lloyd N Trefethen and David Bau.Numerical linear algebra. SIAM, 2022

work page 2022
[33]

Academic press, 2001

Ulrich Trottenberg, Cornelius W Oosterlee, and Anton Schuller.Multigrid methods. Academic press, 2001

work page 2001
[34]

Proper orthogonal decomposition: Theory and reduced-order modelling.Lecture Notes, University of Konstanz, 4(4):1–29, 2013

Stefan Volkwein. Proper orthogonal decomposition: Theory and reduced-order modelling.Lecture Notes, University of Konstanz, 4(4):1–29, 2013

work page 2013
[35]

Sketching as a tool for numerical linear algebra.Foundations and Trends®in Theoretical Computer Science, 10(1–2):1–157, 2014

David P Woodruff et al. Sketching as a tool for numerical linear algebra.Foundations and Trends®in Theoretical Computer Science, 10(1–2):1–157, 2014

work page 2014
[36]

Frequency principle: Fourier analysis sheds light on deep neural networks.arXiv preprint arXiv:1901.06523,

Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, Yanyang Xiao, and Zheng Ma. Frequency principle: Fourier analysis sheds light on deep neural networks.arXiv preprint arXiv:1901.06523, 2019

work page arXiv 1901
[37]

Roundoff error analysis of the choleskyqr2 algorithm.Electron

Yusaku Yamamoto, Yuji Nakatsukasa, Yuka Yanagisawa, and Takeshi Fukaya. Roundoff error analysis of the choleskyqr2 algorithm.Electron. Trans. Numer. Anal, 44(01):306–326, 2015

work page 2015
[38]

Manifold interpolation and model reduction.arXiv preprint arXiv:1902.06502, 2019

Ralf Zimmermann. Manifold interpolation and model reduction.arXiv preprint arXiv:1902.06502, 2019. A Proof of Theorem 1

work page arXiv 1902
[39]

To show thatL 1(A, B) does not depend on the chosen representative we observe that L1(A, B) =p− Q⊤ BQA 2 F = 1 2 ∥PB −P A∥2 F − k−p 2 ,(8) whereP A =A A⊤A −1 A⊤, PB =B B⊤B −1 B⊤ are orthogonal projectors on the columns spaces ofAandB. When QR decompositionsA=Q ARA, B=Q BRB are available, projectors become PA =Q AQ⊤ A, PB =Q BQ⊤ B and identity (8) can be v...

work page
[40]

We first show that L2(A, B;z) = min u ∥Au−Q Bz∥2 2 =∥(I−P A)Q Bz∥2 2 ,(12) whereP A =A A⊤A −1 A⊤ is orthogonal projector on the columns space ofA. UsingI= (I−P A) + PA, andA(I−P A) = (I−P A)A= 0 we obtain min u ∥Au−Q Bz∥2 2 = min u ∥Au−P AQBz−(I−P A)Q Bz∥2 2 = min u ∥Au−P AQBz∥2 2 +∥(I−P A)Q Bz∥2 2 =∥(I−P A)Q Bz∥2 2 .(13) The last equality holds sinceP AQ...

work page
[41]

In most parts of the text we assumed working with the non-compact Stiefel manifold and in this theorem we have data on the compact Stiefel manifold (see [1] for definitions)

From equation (12) we find Ez [L2(A, B;z)] =E z h ∥(I−P A)Q Bz∥2 2 i =E z z⊤ Q⊤ B (I−P A)Q B z =E z tr Q⊤ BQB −Q ⊤ BQAQ⊤ AQ⊤ B zz ⊤ = tr Q⊤ BQB −Q ⊤ BQAQ⊤ AQ⊤ B Ez zz ⊤ =p− Q⊤ BQA 2 F =L 1(A, B).(14) B Proof of Theorem 2 We provide two comments before proceeding with the proof. In most parts of the text we assumed working with the non-compact Stiefel mani...

work page
[42]

, aD andc= 0

Selecta 1, . . . , aD andc= 0

work page
[43]

Gradually increasecand track ellipsoid PD j=1 ajz2 j =c

work page
[44]

While increasingcadd each standard positive lattice point (point with positive integer coordinates) that fall inside the ellipsoids

work page
[45]

To illustrate this process, considerE(z 1,

The order at which lattice points cross an inflating ellipsoid define which eigenvector appears on position kand which vectors form eigenspace of dimensionk. To illustrate this process, considerE(z 1, . . . , zD) =a 1z2 1 +a 2z2 2, wherea 2 ≫a 1. If we follow procedure outlined above we will see that first lattice points encountered are (1,1),(2,1),(3,1),...

work page
[46]

Gaussian random fieldψis generated fromN(0,(id−γ∆) r),γ= 1 20π ,r= 1 2

work page
[47]

For oneD= 2 datasetk 1 =k 2 and for anotherk 1 ̸=k 2 but both are i.i.d

Diffusion coefficient is computed asa=α+ (β−α) (tanh (sψ) + 1)/2 withα= 1,β= 50,s= 1. For oneD= 2 datasetk 1 =k 2 and for anotherk 1 ̸=k 2 but both are i.i.d. random fields generated as described above. In the main text only results fork 1 =k 2 are reported. ForD= 3 elliptic eigenproblem we use setup analogous toD= 2 but grid of size 30×30×30 and k1 =k 2 ...

work page
[48]

Fourier coefficients on a square index setK={0,

Draw i.i.d. Fourier coefficients on a square index setK={0, . . . , M−1} 2 and form a real field by summing complex exponentials. We additionally introduce a Fourier-space weightw k = (1 +λ1|k|2 2)−1 to control the high-frequency components

work page
[49]

Multiply the real field from the previous step byλ 2, then apply a hyperbolic tangent function to control the contrast of the coefficient field values

work page
[50]

Rescale the field to the prescribed interval [α, β] to ensure strict positivity and enforce a controlled contrast ratio ofβ/α. 25 Exact procedure to generate the 2D field is: s0(x, y) = Re   X k∈{0,...,M−1} 2 ck ei(k1x+k2y) 1 +λ 1 ∥k∥2 2   , c k ∼ N(0,1), s(x, y) = tanh λ2 ·s 0(x, y) , k(x, y) =α+ (β−α) s(x, y) + 1 2 , k(x, y)∈[α, β]. Figure 9: Sample...

work page
[51]

A value closer to 1 indicates better subspace alignment

Cosine angles between the true subspaceVand the predicted subspace, computed as the singular values ofQ ⊤ W V. A value closer to 1 indicates better subspace alignment

work page
[52]

A smaller value indicates that the predicted subspace reconstructsV i more accurately

Relative reconstruction errore= min u ∥V−W u∥ 2 for each true basis vectorV, computed as∥(I− QW Q⊤ W )V∥ 2. A smaller value indicates that the predicted subspace reconstructsV i more accurately. 26 Figure 10: First five eigenvectors of the error propagation matrixI−ωD −1A. Top:ω= 1.0. Bottom: ω= 0.9

work page
[53]

We estimateρvia the power method by repeatedly applyingTto a vector: vk+1 = T vk ∥T vk∥

Two-grid convergence rate, measured by the spectral radiusρof the two-grid iteration operatorT. We estimateρvia the power method by repeatedly applyingTto a vector: vk+1 = T vk ∥T vk∥ . A smaller spectral radius indicates faster asymptotic convergence. In Table 7, we report these metrics for the best-performing models and for the ground-truth target subsp...

work page

[1] [1]

Stanford University, 2010

David Amsallem.Interpolation on manifolds of CFD-based fluid and finite element-based structural reduced-order models for on-line aeroelastic predictions. Stanford University, 2010

work page 2010

[2] [2]

Blendenpik: Supercharging lapack’s least-squares solver.SIAM Journal on Scientific Computing, 32(3):1217–1236, 2010

Haim Avron, Petar Maymounkov, and Sivan Toledo. Blendenpik: Supercharging lapack’s least-squares solver.SIAM Journal on Scientific Computing, 32(3):1217–1236, 2010

work page 2010

[3] [3]

Reduced-order modeling.Handbook of numerical analysis, 13:825–895, 2005

Zhaojun Bai, Patrick M Dewilde, and Roland W Freund. Reduced-order modeling.Handbook of numerical analysis, 13:825–895, 2005

work page 2005

[4] [4]

Kernel methods are competitive for operator learning.Journal of Computational Physics, 496:112549, 2024

Pau Batlle, Matthieu Darcy, Bamdad Hosseini, and Houman Owhadi. Kernel methods are competitive for operator learning.Journal of Computational Physics, 496:112549, 2024

work page 2024

[5] [5]

A grassmann manifold handbook: Basic geometry and computational aspects.Advances in Computational Mathematics, 50(1):6, 2024

Thomas Bendokat, Ralf Zimmermann, and P-A Absil. A grassmann manifold handbook: Basic geometry and computational aspects.Advances in Computational Mathematics, 50(1):6, 2024

work page 2024

[6] [6]

Model reduction and neural networks for parametric pdes.The SMAI journal of computational mathematics, 7:121–157, 2021

Kaushik Bhattacharya, Bamdad Hosseini, Nikola B Kovachki, and Andrew M Stuart. Model reduction and neural networks for parametric pdes.The SMAI journal of computational mathematics, 7:121–157, 2021

work page 2021

[7] [7]

Springer, 2006

Christopher M Bishop and Nasser M Nasrabadi.Pattern recognition and machine learning, volume 4. Springer, 2006

work page 2006

[8] [8]

Numerical methods for computing angles between linear subspaces

Ake Bj¨ orck and Gene H Golub. Numerical methods for computing angles between linear subspaces. Mathematics of computation, 27(123):579–594, 1973. 10

work page 1973

[9] [9]

Dy- namics on the double morse potential: a paradigm for roaming reactions with no saddle points.Regular and Chaotic Dynamics, 23(1):60–79, 2018

Barry K Carpenter, Gregory S Ezra, Stavros C Farantos, Zeb C Kramer, and Stephen Wiggins. Dy- namics on the double morse potential: a paradigm for roaming reactions with no saddle points.Regular and Chaotic Dynamics, 23(1):60–79, 2018

work page 2018

[10] [10]

Symbolic discovery of optimization algorithms.Advances in neural information processing systems, 36:49205–49233, 2023

Xiangning Chen, Chen Liang, Da Huang, Esteban Real, Kaiyuan Wang, Hieu Pham, Xuanyi Dong, Thang Luong, Cho-Jui Hsieh, Yifeng Lu, et al. Symbolic discovery of optimization algorithms.Advances in neural information processing systems, 36:49205–49233, 2023

work page 2023

[11] [11]

A gentle introduction to interpolation on the grassmann manifold

Gabriele Ciaramella, Martin J Gander, and Tommaso Vanzan. A gentle introduction to interpolation on the grassmann manifold. 2025

work page 2025

[12] [12]

Deep orthogonal decom- position: a continuously adaptive neural network approach to model order reduction of parametrized partial differential equations

Nicola Rares Franco, Andrea Manzoni, Paolo Zunino, and Jan S Hesthaven. Deep orthogonal decom- position: a continuously adaptive neural network approach to model order reduction of parametrized partial differential equations

work page

[13] [13]

Deep orthogonal de- composition: a continuously adaptive data-driven approach to model order reduction.arXiv preprint arXiv:2404.18841, 2024

Nicola Rares Franco, Andrea Manzoni, Paolo Zunino, and Jan S Hesthaven. Deep orthogonal de- composition: a continuously adaptive data-driven approach to model order reduction.arXiv preprint arXiv:2404.18841, 2024

work page arXiv 2024

[14] [14]

A randomized preconditioned cholesky-qr algorithm.arXiv preprint arXiv:2406.11751, 2024

James E Garrison and Ilse CF Ipsen. A randomized preconditioned cholesky-qr algorithm.arXiv preprint arXiv:2406.11751, 2024

work page arXiv 2024

[15] [15]

Methods of conjugate gradients for solving linear systems

Magnus R Hestenes, Eduard Stiefel, et al. Methods of conjugate gradients for solving linear systems. Journal of research of the National Bureau of Standards, 49(6):409–436, 1952

work page 1952

[16] [16]

Reduced basis methods for time-dependent problems.Acta Numerica, 31:265–345, 2022

Jan S Hesthaven, Cecilia Pagliantini, and Gianluigi Rozza. Reduced basis methods for time-dependent problems.Acta Numerica, 31:265–345, 2022

work page 2022

[17] [17]

Non-intrusive reduced order modeling of nonlinear problems using neural networks.Journal of Computational Physics, 363:55–78, 2018

Jan S Hesthaven and Stefano Ubbiali. Non-intrusive reduced order modeling of nonlinear problems using neural networks.Journal of Computational Physics, 363:55–78, 2018

work page 2018

[18] [18]

Courier Corporation, 2004

Donald E Kirk.Optimal control theory: an introduction. Courier Corporation, 2004

work page 2004

[19] [19]

Artificial neural networks for solving ordinary and partial differential equations.IEEE transactions on neural networks, 9(5):987–1000, 1998

Isaac E Lagaris, Aristidis Likas, and Dimitrios I Fotiadis. Artificial neural networks for solving ordinary and partial differential equations.IEEE transactions on neural networks, 9(5):987–1000, 1998

work page 1998

[20] [20]

An accurate analytic potential function for ground-state n2 from a direct-potential-fit analysis of spectroscopic data.The Journal of chemical physics, 125(16), 2006

Robert J Le Roy, Yiye Huang, and Calvin Jary. An accurate analytic potential function for ground-state n2 from a direct-potential-fit analysis of spectroscopic data.The Journal of chemical physics, 125(16), 2006

work page 2006

[21] [21]

Fourier Neural Operator for Parametric Partial Differential Equations

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[22] [22]

DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators

Lu Lu, Pengzhan Jin, and George Em Karniadakis. Deeponet: Learning nonlinear operators for identi- fying differential equations based on the universal approximation theorem of operators.arXiv preprint arXiv:1910.03193, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910

[23] [23]

Machine-learning custom-made basis functions for partial differential equations.arXiv preprint arXiv:2111.05307, 2021

Brek Meuris, Saad Qadeer, and Panos Stinis. Machine-learning custom-made basis functions for partial differential equations.arXiv preprint arXiv:2111.05307, 2021

work page arXiv 2021

[24] [24]

Machine-learning-based spectral methods for partial differential equations.Scientific Reports, 13(1):1739, 2023

Brek Meuris, Saad Qadeer, and Panos Stinis. Machine-learning-based spectral methods for partial differential equations.Scientific Reports, 13(1):1739, 2023

work page 2023

[25] [25]

Principal component analysis in linear systems: Controllability, observability, and model reduction.IEEE transactions on automatic control, 26(1):17–32, 2003

Bruce Moore. Principal component analysis in linear systems: Controllability, observability, and model reduction.IEEE transactions on automatic control, 26(1):17–32, 2003. 11

work page 2003

[26] [26]

Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational physics, 378:686–707, 2019

work page 2019

[27] [27]

SIAM, 2003

Yousef Saad.Iterative methods for sparse linear systems. SIAM, 2003

work page 2003

[28] [28]

SIAM, 2011

Yousef Saad.Numerical methods for large eigenvalue problems: revised edition. SIAM, 2011

work page 2011

[29] [29]

A deflated version of the conjugate gradient algorithm.SIAM Journal on Scientific Computing, 21(5):1909–1926, 2000

Yousef Saad, Manshung Yeung, Jocelyne Erhel, and Fr´ ed´ eric Guyomarc’h. A deflated version of the conjugate gradient algorithm.SIAM Journal on Scientific Computing, 21(5):1909–1926, 2000

work page 1909

[30] [30]

Implicit neural representations with periodic activation functions.Advances in neural information processing systems, 33:7462–7473, 2020

Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. Implicit neural representations with periodic activation functions.Advances in neural information processing systems, 33:7462–7473, 2020

work page 2020

[31] [31]

Factorized fourier neural oper- ators.arXiv preprint arXiv:2111.13802, 2021

Alasdair Tran, Alexander Mathews, Lexing Xie, and Cheng Soon Ong. Factorized fourier neural oper- ators.arXiv preprint arXiv:2111.13802, 2021

work page arXiv 2021

[32] [32]

SIAM, 2022

Lloyd N Trefethen and David Bau.Numerical linear algebra. SIAM, 2022

work page 2022

[33] [33]

Academic press, 2001

Ulrich Trottenberg, Cornelius W Oosterlee, and Anton Schuller.Multigrid methods. Academic press, 2001

work page 2001

[34] [34]

Proper orthogonal decomposition: Theory and reduced-order modelling.Lecture Notes, University of Konstanz, 4(4):1–29, 2013

Stefan Volkwein. Proper orthogonal decomposition: Theory and reduced-order modelling.Lecture Notes, University of Konstanz, 4(4):1–29, 2013

work page 2013

[35] [35]

Sketching as a tool for numerical linear algebra.Foundations and Trends®in Theoretical Computer Science, 10(1–2):1–157, 2014

David P Woodruff et al. Sketching as a tool for numerical linear algebra.Foundations and Trends®in Theoretical Computer Science, 10(1–2):1–157, 2014

work page 2014

[36] [36]

Frequency principle: Fourier analysis sheds light on deep neural networks.arXiv preprint arXiv:1901.06523,

Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, Yanyang Xiao, and Zheng Ma. Frequency principle: Fourier analysis sheds light on deep neural networks.arXiv preprint arXiv:1901.06523, 2019

work page arXiv 1901

[37] [37]

Roundoff error analysis of the choleskyqr2 algorithm.Electron

Yusaku Yamamoto, Yuji Nakatsukasa, Yuka Yanagisawa, and Takeshi Fukaya. Roundoff error analysis of the choleskyqr2 algorithm.Electron. Trans. Numer. Anal, 44(01):306–326, 2015

work page 2015

[38] [38]

Manifold interpolation and model reduction.arXiv preprint arXiv:1902.06502, 2019

Ralf Zimmermann. Manifold interpolation and model reduction.arXiv preprint arXiv:1902.06502, 2019. A Proof of Theorem 1

work page arXiv 1902

[39] [39]

To show thatL 1(A, B) does not depend on the chosen representative we observe that L1(A, B) =p− Q⊤ BQA 2 F = 1 2 ∥PB −P A∥2 F − k−p 2 ,(8) whereP A =A A⊤A −1 A⊤, PB =B B⊤B −1 B⊤ are orthogonal projectors on the columns spaces ofAandB. When QR decompositionsA=Q ARA, B=Q BRB are available, projectors become PA =Q AQ⊤ A, PB =Q BQ⊤ B and identity (8) can be v...

work page

[40] [40]

We first show that L2(A, B;z) = min u ∥Au−Q Bz∥2 2 =∥(I−P A)Q Bz∥2 2 ,(12) whereP A =A A⊤A −1 A⊤ is orthogonal projector on the columns space ofA. UsingI= (I−P A) + PA, andA(I−P A) = (I−P A)A= 0 we obtain min u ∥Au−Q Bz∥2 2 = min u ∥Au−P AQBz−(I−P A)Q Bz∥2 2 = min u ∥Au−P AQBz∥2 2 +∥(I−P A)Q Bz∥2 2 =∥(I−P A)Q Bz∥2 2 .(13) The last equality holds sinceP AQ...

work page

[41] [41]

In most parts of the text we assumed working with the non-compact Stiefel manifold and in this theorem we have data on the compact Stiefel manifold (see [1] for definitions)

From equation (12) we find Ez [L2(A, B;z)] =E z h ∥(I−P A)Q Bz∥2 2 i =E z z⊤ Q⊤ B (I−P A)Q B z =E z tr Q⊤ BQB −Q ⊤ BQAQ⊤ AQ⊤ B zz ⊤ = tr Q⊤ BQB −Q ⊤ BQAQ⊤ AQ⊤ B Ez zz ⊤ =p− Q⊤ BQA 2 F =L 1(A, B).(14) B Proof of Theorem 2 We provide two comments before proceeding with the proof. In most parts of the text we assumed working with the non-compact Stiefel mani...

work page

[42] [42]

, aD andc= 0

Selecta 1, . . . , aD andc= 0

work page

[43] [43]

Gradually increasecand track ellipsoid PD j=1 ajz2 j =c

work page

[44] [44]

While increasingcadd each standard positive lattice point (point with positive integer coordinates) that fall inside the ellipsoids

work page

[45] [45]

To illustrate this process, considerE(z 1,

The order at which lattice points cross an inflating ellipsoid define which eigenvector appears on position kand which vectors form eigenspace of dimensionk. To illustrate this process, considerE(z 1, . . . , zD) =a 1z2 1 +a 2z2 2, wherea 2 ≫a 1. If we follow procedure outlined above we will see that first lattice points encountered are (1,1),(2,1),(3,1),...

work page

[46] [46]

Gaussian random fieldψis generated fromN(0,(id−γ∆) r),γ= 1 20π ,r= 1 2

work page

[47] [47]

For oneD= 2 datasetk 1 =k 2 and for anotherk 1 ̸=k 2 but both are i.i.d

Diffusion coefficient is computed asa=α+ (β−α) (tanh (sψ) + 1)/2 withα= 1,β= 50,s= 1. For oneD= 2 datasetk 1 =k 2 and for anotherk 1 ̸=k 2 but both are i.i.d. random fields generated as described above. In the main text only results fork 1 =k 2 are reported. ForD= 3 elliptic eigenproblem we use setup analogous toD= 2 but grid of size 30×30×30 and k1 =k 2 ...

work page

[48] [48]

Fourier coefficients on a square index setK={0,

Draw i.i.d. Fourier coefficients on a square index setK={0, . . . , M−1} 2 and form a real field by summing complex exponentials. We additionally introduce a Fourier-space weightw k = (1 +λ1|k|2 2)−1 to control the high-frequency components

work page

[49] [49]

Multiply the real field from the previous step byλ 2, then apply a hyperbolic tangent function to control the contrast of the coefficient field values

work page

[50] [50]

Rescale the field to the prescribed interval [α, β] to ensure strict positivity and enforce a controlled contrast ratio ofβ/α. 25 Exact procedure to generate the 2D field is: s0(x, y) = Re   X k∈{0,...,M−1} 2 ck ei(k1x+k2y) 1 +λ 1 ∥k∥2 2   , c k ∼ N(0,1), s(x, y) = tanh λ2 ·s 0(x, y) , k(x, y) =α+ (β−α) s(x, y) + 1 2 , k(x, y)∈[α, β]. Figure 9: Sample...

work page

[51] [51]

A value closer to 1 indicates better subspace alignment

Cosine angles between the true subspaceVand the predicted subspace, computed as the singular values ofQ ⊤ W V. A value closer to 1 indicates better subspace alignment

work page

[52] [52]

A smaller value indicates that the predicted subspace reconstructsV i more accurately

Relative reconstruction errore= min u ∥V−W u∥ 2 for each true basis vectorV, computed as∥(I− QW Q⊤ W )V∥ 2. A smaller value indicates that the predicted subspace reconstructsV i more accurately. 26 Figure 10: First five eigenvectors of the error propagation matrixI−ωD −1A. Top:ω= 1.0. Bottom: ω= 0.9

work page

[53] [53]

We estimateρvia the power method by repeatedly applyingTto a vector: vk+1 = T vk ∥T vk∥

Two-grid convergence rate, measured by the spectral radiusρof the two-grid iteration operatorT. We estimateρvia the power method by repeatedly applyingTto a vector: vk+1 = T vk ∥T vk∥ . A smaller spectral radius indicates faster asymptotic convergence. In Table 7, we report these metrics for the best-performing models and for the ground-truth target subsp...

work page