pith. sign in

arxiv: 2509.23249 · v4 · submitted 2025-09-27 · 💻 cs.LG · cs.NA· math.NA

Deep Learning for Subspace Regression

Pith reviewed 2026-05-18 12:03 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.NA
keywords subspace regressionneural networksreduced order modelingGrassmann manifoldparametric eigenproblemsredundancy strategydeep learningparametric PDEs
0
0 comments X

The pith

Predicting larger-than-required subspaces with neural networks simplifies the mapping and improves accuracy in parametric subspace regression.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to use neural networks for regressing linear subspaces that depend on high-dimensional parameters in reduced-order modeling. Instead of classical interpolation, which fails in high dimensions, they relax the problem to regression with suitable loss functions for subspace data. A key innovation is introducing redundancy by predicting larger subspaces than needed, which theoretically decreases mapping complexity for elliptic eigenproblems with constant coefficients and smooths the mapping for smooth functions on the Grassmann manifold. This leads to significantly better empirical accuracy across several applications including parametric eigenproblems and PDE solutions.

Core claim

Predicting oversized subspaces instead of exact-dimension ones decreases the complexity of the mapping for elliptic eigenproblems with constant coefficients and makes the mapping smoother for general smooth functions on the Grassmann manifold, resulting in improved neural network approximation accuracy for subspace regression.

What carries the argument

The redundancy mechanism of predicting a subspace of dimension larger than the target one, which reduces the complexity of the target function on the Grassmann manifold.

If this is right

  • Accuracy significantly improves when larger-than-required subspaces are predicted in empirical tests.
  • The strategy decreases the complexity of the mapping for elliptic eigenproblems with constant coefficients.
  • The mapping becomes smoother for general smooth functions on the Grassmann manifold.
  • Subspace regression applies successfully to parametric eigenproblems, deflation techniques, relaxation methods, optimal control, and parametric partial differential equations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This redundancy approach might generalize to other regression tasks on manifolds where exact low-dimensional targets are hard to learn.
  • It could reduce the computational cost of offline stages in reduced-order modeling by enabling more reliable online approximations.
  • Extensions to non-smooth or discontinuous subspace dependencies may require additional regularization techniques.

Load-bearing premise

The mapping from parameters to subspaces is sufficiently smooth or can be well-approximated by a neural network when extra redundancy is added.

What would settle it

A numerical test on an elliptic eigenproblem with constant coefficients showing no improvement or even degradation in accuracy when the predicted subspace dimension is increased beyond the required size.

Figures

Figures reproduced from arXiv: 2509.23249 by Alexander Rudikov, Ekaterina Muravleva, Ivan Oseledets, Vladimir Fanaskov, Vladislav Trifonov.

Figure 1
Figure 1. Figure 1: Selected results for eigenspace prediction: (a) Comparison of training time for losses [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Relative errors for selected baselines. Label “subspace” refers to subspace regression. For the [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Convergence results for iterative methods. Learned methods are marked with solid lines, and [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example of subspace embedding detailed in Appendix C. [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (a) Parametric family of ellipsoids passing through point (4 [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of simple greedy subspace embedding technique for elliptic eigenproblem with constant [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Relative errors for two stationary diffusion equations depending on the number of basis functions [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Convergence results for deflated CG, elliptic dataset [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Sample coefficient function. The equation is descritized on a uniform grid with a 5-point finite-difference stencil, yielding a sparse, symmetric positive￾definite matrix. One can observe a sampled normalized coeffi￾cient function in [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: First five eigenvectors of the error propagation matrix [PITH_FULL_IMAGE:figures/full_fig_p027_10.png] view at source ↗
read the original abstract

It is often possible to perform reduced order modelling by specifying linear subspace which accurately captures the dynamics of the system. This approach becomes especially appealing when linear subspace explicitly depends on parameters of the problem. A practical way to apply such a scheme is to compute subspaces for a selected set of parameters in the computationally demanding offline stage and in the online stage approximate subspace for unknown parameters by interpolation. For realistic problems the space of parameters is high dimensional, which renders classical interpolation strategies infeasible or unreliable. We propose to relax the interpolation problem to regression, introduce several loss functions suitable for subspace data, and use a neural network as an approximation to high-dimensional target function. To further simplify a learning problem we introduce redundancy: in place of predicting subspace of a given dimension we predict larger subspace. We show theoretically that this strategy decreases the complexity of the mapping for elliptic eigenproblems with constant coefficients and makes the mapping smoother for general smooth function on the Grassmann manifold. Empirical results also show that accuracy significantly improves when larger-than-required subspaces are predicted. With the set of numerical illustrations we demonstrate that subspace regression can be useful for a range of tasks including parametric eigenproblems, deflation techniques, relaxation methods, optimal control and solution of parametric partial differential equations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes relaxing subspace interpolation to regression using neural networks for parameter-dependent linear subspaces in reduced-order modeling. It introduces redundancy by predicting oversized subspaces (dimension r+m instead of r) to simplify the target mapping, provides a theoretical argument that this decreases complexity for constant-coefficient elliptic eigenproblems and smooths the map on the Grassmann manifold for general smooth targets, and reports empirical accuracy gains across tasks including parametric eigenproblems, deflation, relaxation methods, optimal control, and parametric PDEs.

Significance. If the redundancy-based smoothing and accuracy claims hold with rigorous quantification, the work offers a practical route to neural-network regression for high-dimensional parametric reduced-order models, extending beyond classical interpolation. The explicit theoretical treatment for the constant-coefficient case and the reproducible empirical illustrations are strengths that could influence scientific machine-learning practice.

major comments (2)
  1. [Theoretical analysis] Theoretical section on redundancy: the claim that predicting larger subspaces 'makes the mapping smoother for general smooth function on the Grassmann manifold' is stated without a precise metric (e.g., reduction in Lipschitz constant, Sobolev norm, or covering number) or a proof that the improvement is sufficient to offset the high-dimensional parameter space; this is load-bearing for both the theoretical justification and the interpretation of the reported accuracy gains.
  2. [Numerical illustrations] Empirical evaluation: the abstract and results assert 'accuracy significantly improves' with larger subspaces, yet no error bars, dataset sizes, or explicit baseline comparisons (e.g., against direct r-dimensional regression or classical interpolation) are visible in the provided description, undermining verification of the central empirical claim.
minor comments (2)
  1. [Abstract] The abstract refers to 'several loss functions suitable for subspace data' without naming them or pointing to the defining equations; adding a short list or reference would improve readability.
  2. [Introduction / Theory] Notation for the Grassmann manifold and the redundancy parameter m should be introduced consistently in the first theoretical paragraph to avoid later ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. The comments help clarify how to strengthen the presentation of both the theoretical justification for redundancy and the empirical results. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Theoretical analysis] Theoretical section on redundancy: the claim that predicting larger subspaces 'makes the mapping smoother for general smooth function on the Grassmann manifold' is stated without a precise metric (e.g., reduction in Lipschitz constant, Sobolev norm, or covering number) or a proof that the improvement is sufficient to offset the high-dimensional parameter space; this is load-bearing for both the theoretical justification and the interpretation of the reported accuracy gains.

    Authors: We agree that a more quantitative treatment would strengthen the general case. The manuscript currently shows an explicit complexity reduction for constant-coefficient elliptic eigenproblems and gives a geometric argument that extra dimensions on the Grassmann manifold permit a smoother parametrization for general smooth targets. We will revise the theoretical section to introduce a concrete metric (e.g., a bound on the Lipschitz constant of the composed map) and a short argument showing that the reduction in variation scales favorably with the added redundancy even in high-dimensional parameter spaces. This will be supported by a brief proof sketch. revision: yes

  2. Referee: [Numerical illustrations] Empirical evaluation: the abstract and results assert 'accuracy significantly improves' with larger subspaces, yet no error bars, dataset sizes, or explicit baseline comparisons (e.g., against direct r-dimensional regression or classical interpolation) are visible in the provided description, undermining verification of the central empirical claim.

    Authors: We acknowledge that the summary provided to the referee did not make these details sufficiently prominent. The full manuscript already contains dataset sizes, baseline comparisons against both direct r-dimensional regression and classical interpolation, and error bars on the reported figures. To address the concern directly, we will revise the abstract, the results section, and the figure captions to state these elements explicitly, including quantitative improvement factors and the number of independent runs used for the error bars. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained with independent theoretical analysis.

full rationale

The paper presents a new regression-based approach to subspace approximation via neural networks, with redundancy introduced to simplify the learning task. The key theoretical claim—that redundancy reduces mapping complexity for constant-coefficient elliptic eigenproblems and smooths the map on the Grassmann manifold—is stated as a derived result from analysis of the problem structure, not as a re-expression of fitted inputs or prior self-citations. No equations or steps reduce by construction to the inputs (e.g., no fitted parameter renamed as prediction, no ansatz smuggled via self-citation, no uniqueness theorem imported from overlapping authors). Empirical accuracy gains are reported separately from the theory. The derivation chain remains independent of the target result itself.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard properties of the Grassmann manifold and the existence of a sufficiently regular mapping from parameters to subspaces; no new physical entities or ad-hoc constants are introduced in the abstract.

free parameters (2)
  • oversized subspace dimension
    Chosen larger than the target rank; its specific value is a modeling choice that affects both theory and empirical results.
  • neural network hyperparameters
    Architecture depth, width, and training details are free choices not fixed by the problem statement.
axioms (2)
  • domain assumption The mapping from parameters to subspaces is smooth on the Grassmann manifold
    Invoked to claim that redundancy makes the mapping smoother (abstract).
  • standard math Standard properties of elliptic eigenproblems with constant coefficients
    Used for the complexity-reduction theoretical result.

pith-pipeline@v0.9.0 · 5759 in / 1385 out tokens · 28093 ms · 2026-05-18T12:03:50.180787+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Mode-realigned pointwise interpolation (MRPWI) for efficient POD-Galerkin parametric reduced-order models

    math.NA 2026-04 unverdicted novelty 6.0

    MRPWI synchronizes POD modes via sign and rotation alignment to enable fast, accurate pointwise interpolation for parametric POD-Galerkin reduced-order models.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Stanford University, 2010

    David Amsallem.Interpolation on manifolds of CFD-based fluid and finite element-based structural reduced-order models for on-line aeroelastic predictions. Stanford University, 2010

  2. [2]

    Blendenpik: Supercharging lapack’s least-squares solver.SIAM Journal on Scientific Computing, 32(3):1217–1236, 2010

    Haim Avron, Petar Maymounkov, and Sivan Toledo. Blendenpik: Supercharging lapack’s least-squares solver.SIAM Journal on Scientific Computing, 32(3):1217–1236, 2010

  3. [3]

    Reduced-order modeling.Handbook of numerical analysis, 13:825–895, 2005

    Zhaojun Bai, Patrick M Dewilde, and Roland W Freund. Reduced-order modeling.Handbook of numerical analysis, 13:825–895, 2005

  4. [4]

    Kernel methods are competitive for operator learning.Journal of Computational Physics, 496:112549, 2024

    Pau Batlle, Matthieu Darcy, Bamdad Hosseini, and Houman Owhadi. Kernel methods are competitive for operator learning.Journal of Computational Physics, 496:112549, 2024

  5. [5]

    A grassmann manifold handbook: Basic geometry and computational aspects.Advances in Computational Mathematics, 50(1):6, 2024

    Thomas Bendokat, Ralf Zimmermann, and P-A Absil. A grassmann manifold handbook: Basic geometry and computational aspects.Advances in Computational Mathematics, 50(1):6, 2024

  6. [6]

    Model reduction and neural networks for parametric pdes.The SMAI journal of computational mathematics, 7:121–157, 2021

    Kaushik Bhattacharya, Bamdad Hosseini, Nikola B Kovachki, and Andrew M Stuart. Model reduction and neural networks for parametric pdes.The SMAI journal of computational mathematics, 7:121–157, 2021

  7. [7]

    Springer, 2006

    Christopher M Bishop and Nasser M Nasrabadi.Pattern recognition and machine learning, volume 4. Springer, 2006

  8. [8]

    Numerical methods for computing angles between linear subspaces

    Ake Bj¨ orck and Gene H Golub. Numerical methods for computing angles between linear subspaces. Mathematics of computation, 27(123):579–594, 1973. 10

  9. [9]

    Dy- namics on the double morse potential: a paradigm for roaming reactions with no saddle points.Regular and Chaotic Dynamics, 23(1):60–79, 2018

    Barry K Carpenter, Gregory S Ezra, Stavros C Farantos, Zeb C Kramer, and Stephen Wiggins. Dy- namics on the double morse potential: a paradigm for roaming reactions with no saddle points.Regular and Chaotic Dynamics, 23(1):60–79, 2018

  10. [10]

    Symbolic discovery of optimization algorithms.Advances in neural information processing systems, 36:49205–49233, 2023

    Xiangning Chen, Chen Liang, Da Huang, Esteban Real, Kaiyuan Wang, Hieu Pham, Xuanyi Dong, Thang Luong, Cho-Jui Hsieh, Yifeng Lu, et al. Symbolic discovery of optimization algorithms.Advances in neural information processing systems, 36:49205–49233, 2023

  11. [11]

    A gentle introduction to interpolation on the grassmann manifold

    Gabriele Ciaramella, Martin J Gander, and Tommaso Vanzan. A gentle introduction to interpolation on the grassmann manifold. 2025

  12. [12]

    Deep orthogonal decom- position: a continuously adaptive neural network approach to model order reduction of parametrized partial differential equations

    Nicola Rares Franco, Andrea Manzoni, Paolo Zunino, and Jan S Hesthaven. Deep orthogonal decom- position: a continuously adaptive neural network approach to model order reduction of parametrized partial differential equations

  13. [13]

    Deep orthogonal de- composition: a continuously adaptive data-driven approach to model order reduction.arXiv preprint arXiv:2404.18841, 2024

    Nicola Rares Franco, Andrea Manzoni, Paolo Zunino, and Jan S Hesthaven. Deep orthogonal de- composition: a continuously adaptive data-driven approach to model order reduction.arXiv preprint arXiv:2404.18841, 2024

  14. [14]

    A randomized preconditioned cholesky-qr algorithm.arXiv preprint arXiv:2406.11751, 2024

    James E Garrison and Ilse CF Ipsen. A randomized preconditioned cholesky-qr algorithm.arXiv preprint arXiv:2406.11751, 2024

  15. [15]

    Methods of conjugate gradients for solving linear systems

    Magnus R Hestenes, Eduard Stiefel, et al. Methods of conjugate gradients for solving linear systems. Journal of research of the National Bureau of Standards, 49(6):409–436, 1952

  16. [16]

    Reduced basis methods for time-dependent problems.Acta Numerica, 31:265–345, 2022

    Jan S Hesthaven, Cecilia Pagliantini, and Gianluigi Rozza. Reduced basis methods for time-dependent problems.Acta Numerica, 31:265–345, 2022

  17. [17]

    Non-intrusive reduced order modeling of nonlinear problems using neural networks.Journal of Computational Physics, 363:55–78, 2018

    Jan S Hesthaven and Stefano Ubbiali. Non-intrusive reduced order modeling of nonlinear problems using neural networks.Journal of Computational Physics, 363:55–78, 2018

  18. [18]

    Courier Corporation, 2004

    Donald E Kirk.Optimal control theory: an introduction. Courier Corporation, 2004

  19. [19]

    Artificial neural networks for solving ordinary and partial differential equations.IEEE transactions on neural networks, 9(5):987–1000, 1998

    Isaac E Lagaris, Aristidis Likas, and Dimitrios I Fotiadis. Artificial neural networks for solving ordinary and partial differential equations.IEEE transactions on neural networks, 9(5):987–1000, 1998

  20. [20]

    An accurate analytic potential function for ground-state n2 from a direct-potential-fit analysis of spectroscopic data.The Journal of chemical physics, 125(16), 2006

    Robert J Le Roy, Yiye Huang, and Calvin Jary. An accurate analytic potential function for ground-state n2 from a direct-potential-fit analysis of spectroscopic data.The Journal of chemical physics, 125(16), 2006

  21. [21]

    Fourier Neural Operator for Parametric Partial Differential Equations

    Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895, 2020

  22. [22]

    DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators

    Lu Lu, Pengzhan Jin, and George Em Karniadakis. Deeponet: Learning nonlinear operators for identi- fying differential equations based on the universal approximation theorem of operators.arXiv preprint arXiv:1910.03193, 2019

  23. [23]

    Machine-learning custom-made basis functions for partial differential equations.arXiv preprint arXiv:2111.05307, 2021

    Brek Meuris, Saad Qadeer, and Panos Stinis. Machine-learning custom-made basis functions for partial differential equations.arXiv preprint arXiv:2111.05307, 2021

  24. [24]

    Machine-learning-based spectral methods for partial differential equations.Scientific Reports, 13(1):1739, 2023

    Brek Meuris, Saad Qadeer, and Panos Stinis. Machine-learning-based spectral methods for partial differential equations.Scientific Reports, 13(1):1739, 2023

  25. [25]

    Principal component analysis in linear systems: Controllability, observability, and model reduction.IEEE transactions on automatic control, 26(1):17–32, 2003

    Bruce Moore. Principal component analysis in linear systems: Controllability, observability, and model reduction.IEEE transactions on automatic control, 26(1):17–32, 2003. 11

  26. [26]

    Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational physics, 378:686–707, 2019

  27. [27]

    SIAM, 2003

    Yousef Saad.Iterative methods for sparse linear systems. SIAM, 2003

  28. [28]

    SIAM, 2011

    Yousef Saad.Numerical methods for large eigenvalue problems: revised edition. SIAM, 2011

  29. [29]

    A deflated version of the conjugate gradient algorithm.SIAM Journal on Scientific Computing, 21(5):1909–1926, 2000

    Yousef Saad, Manshung Yeung, Jocelyne Erhel, and Fr´ ed´ eric Guyomarc’h. A deflated version of the conjugate gradient algorithm.SIAM Journal on Scientific Computing, 21(5):1909–1926, 2000

  30. [30]

    Implicit neural representations with periodic activation functions.Advances in neural information processing systems, 33:7462–7473, 2020

    Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. Implicit neural representations with periodic activation functions.Advances in neural information processing systems, 33:7462–7473, 2020

  31. [31]

    Factorized fourier neural oper- ators.arXiv preprint arXiv:2111.13802, 2021

    Alasdair Tran, Alexander Mathews, Lexing Xie, and Cheng Soon Ong. Factorized fourier neural oper- ators.arXiv preprint arXiv:2111.13802, 2021

  32. [32]

    SIAM, 2022

    Lloyd N Trefethen and David Bau.Numerical linear algebra. SIAM, 2022

  33. [33]

    Academic press, 2001

    Ulrich Trottenberg, Cornelius W Oosterlee, and Anton Schuller.Multigrid methods. Academic press, 2001

  34. [34]

    Proper orthogonal decomposition: Theory and reduced-order modelling.Lecture Notes, University of Konstanz, 4(4):1–29, 2013

    Stefan Volkwein. Proper orthogonal decomposition: Theory and reduced-order modelling.Lecture Notes, University of Konstanz, 4(4):1–29, 2013

  35. [35]

    Sketching as a tool for numerical linear algebra.Foundations and Trends®in Theoretical Computer Science, 10(1–2):1–157, 2014

    David P Woodruff et al. Sketching as a tool for numerical linear algebra.Foundations and Trends®in Theoretical Computer Science, 10(1–2):1–157, 2014

  36. [36]

    Frequency principle: Fourier analysis sheds light on deep neural networks.arXiv preprint arXiv:1901.06523,

    Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, Yanyang Xiao, and Zheng Ma. Frequency principle: Fourier analysis sheds light on deep neural networks.arXiv preprint arXiv:1901.06523, 2019

  37. [37]

    Roundoff error analysis of the choleskyqr2 algorithm.Electron

    Yusaku Yamamoto, Yuji Nakatsukasa, Yuka Yanagisawa, and Takeshi Fukaya. Roundoff error analysis of the choleskyqr2 algorithm.Electron. Trans. Numer. Anal, 44(01):306–326, 2015

  38. [38]

    Manifold interpolation and model reduction.arXiv preprint arXiv:1902.06502, 2019

    Ralf Zimmermann. Manifold interpolation and model reduction.arXiv preprint arXiv:1902.06502, 2019. A Proof of Theorem 1

  39. [39]

    To show thatL 1(A, B) does not depend on the chosen representative we observe that L1(A, B) =p− Q⊤ BQA 2 F = 1 2 ∥PB −P A∥2 F − k−p 2 ,(8) whereP A =A A⊤A −1 A⊤, PB =B B⊤B −1 B⊤ are orthogonal projectors on the columns spaces ofAandB. When QR decompositionsA=Q ARA, B=Q BRB are available, projectors become PA =Q AQ⊤ A, PB =Q BQ⊤ B and identity (8) can be v...

  40. [40]

    We first show that L2(A, B;z) = min u ∥Au−Q Bz∥2 2 =∥(I−P A)Q Bz∥2 2 ,(12) whereP A =A A⊤A −1 A⊤ is orthogonal projector on the columns space ofA. UsingI= (I−P A) + PA, andA(I−P A) = (I−P A)A= 0 we obtain min u ∥Au−Q Bz∥2 2 = min u ∥Au−P AQBz−(I−P A)Q Bz∥2 2 = min u ∥Au−P AQBz∥2 2 +∥(I−P A)Q Bz∥2 2 =∥(I−P A)Q Bz∥2 2 .(13) The last equality holds sinceP AQ...

  41. [41]

    In most parts of the text we assumed working with the non-compact Stiefel manifold and in this theorem we have data on the compact Stiefel manifold (see [1] for definitions)

    From equation (12) we find Ez [L2(A, B;z)] =E z h ∥(I−P A)Q Bz∥2 2 i =E z z⊤ Q⊤ B (I−P A)Q B z =E z tr Q⊤ BQB −Q ⊤ BQAQ⊤ AQ⊤ B zz ⊤ = tr Q⊤ BQB −Q ⊤ BQAQ⊤ AQ⊤ B Ez zz ⊤ =p− Q⊤ BQA 2 F =L 1(A, B).(14) B Proof of Theorem 2 We provide two comments before proceeding with the proof. In most parts of the text we assumed working with the non-compact Stiefel mani...

  42. [42]

    , aD andc= 0

    Selecta 1, . . . , aD andc= 0

  43. [43]

    Gradually increasecand track ellipsoid PD j=1 ajz2 j =c

  44. [44]

    While increasingcadd each standard positive lattice point (point with positive integer coordinates) that fall inside the ellipsoids

  45. [45]

    To illustrate this process, considerE(z 1,

    The order at which lattice points cross an inflating ellipsoid define which eigenvector appears on position kand which vectors form eigenspace of dimensionk. To illustrate this process, considerE(z 1, . . . , zD) =a 1z2 1 +a 2z2 2, wherea 2 ≫a 1. If we follow procedure outlined above we will see that first lattice points encountered are (1,1),(2,1),(3,1),...

  46. [46]

    Gaussian random fieldψis generated fromN(0,(id−γ∆) r),γ= 1 20π ,r= 1 2

  47. [47]

    For oneD= 2 datasetk 1 =k 2 and for anotherk 1 ̸=k 2 but both are i.i.d

    Diffusion coefficient is computed asa=α+ (β−α) (tanh (sψ) + 1)/2 withα= 1,β= 50,s= 1. For oneD= 2 datasetk 1 =k 2 and for anotherk 1 ̸=k 2 but both are i.i.d. random fields generated as described above. In the main text only results fork 1 =k 2 are reported. ForD= 3 elliptic eigenproblem we use setup analogous toD= 2 but grid of size 30×30×30 and k1 =k 2 ...

  48. [48]

    Fourier coefficients on a square index setK={0,

    Draw i.i.d. Fourier coefficients on a square index setK={0, . . . , M−1} 2 and form a real field by summing complex exponentials. We additionally introduce a Fourier-space weightw k = (1 +λ1|k|2 2)−1 to control the high-frequency components

  49. [49]

    Multiply the real field from the previous step byλ 2, then apply a hyperbolic tangent function to control the contrast of the coefficient field values

  50. [50]

    Rescale the field to the prescribed interval [α, β] to ensure strict positivity and enforce a controlled contrast ratio ofβ/α. 25 Exact procedure to generate the 2D field is: s0(x, y) = Re   X k∈{0,...,M−1} 2 ck ei(k1x+k2y) 1 +λ 1 ∥k∥2 2   , c k ∼ N(0,1), s(x, y) = tanh λ2 ·s 0(x, y) , k(x, y) =α+ (β−α) s(x, y) + 1 2 , k(x, y)∈[α, β]. Figure 9: Sample...

  51. [51]

    A value closer to 1 indicates better subspace alignment

    Cosine angles between the true subspaceVand the predicted subspace, computed as the singular values ofQ ⊤ W V. A value closer to 1 indicates better subspace alignment

  52. [52]

    A smaller value indicates that the predicted subspace reconstructsV i more accurately

    Relative reconstruction errore= min u ∥V−W u∥ 2 for each true basis vectorV, computed as∥(I− QW Q⊤ W )V∥ 2. A smaller value indicates that the predicted subspace reconstructsV i more accurately. 26 Figure 10: First five eigenvectors of the error propagation matrixI−ωD −1A. Top:ω= 1.0. Bottom: ω= 0.9

  53. [53]

    We estimateρvia the power method by repeatedly applyingTto a vector: vk+1 = T vk ∥T vk∥

    Two-grid convergence rate, measured by the spectral radiusρof the two-grid iteration operatorT. We estimateρvia the power method by repeatedly applyingTto a vector: vk+1 = T vk ∥T vk∥ . A smaller spectral radius indicates faster asymptotic convergence. In Table 7, we report these metrics for the best-performing models and for the ground-truth target subsp...